Thursday, November 14, 2013

Intro to the MEAN Stack - Part 1 - The Data

I recently changed jobs to join a startup. One of the many reasons why I took the job is the fact that the software is built with the MEAN stack stack – MongoDb, Express.js, AngularjS and Node.js. Prior to joining the company I had dabbled in each of the parts of the stack but I hadn’t used any of them on a ‘real’ project. So to reinforce what I’m learning during the day by using the MEAN stack to build an app to view the stats for the New York/San Francisco Giants. I thought I would share my experiences hopefully to help someone else who's learning the MEAN stack. This is the first post in a three part series that will walk you through building my app. The planned posts are:

Part 1 – The Data: Converts the MySQL data model to a suitable model for MongoDB.

Part 2 – The API: Building out a Node.js based API that will allow us to retrieve the stats.

Part 3 – The UI: Covers building out an AngularJS UI

Goal

The goal for this post is to have the data modeled and loaded into a MongoDB database so we can use it in the next post.

Setup

If you want to 'follow along' with this post you will need to download and install MongoDB and Node.js. Once you have installed mongo and node you can download the part 1 code. Keep in mind before you run the load scripts you will need to do install the node packages. To do that, simply change into the <YOUR CODE DIR>/post-1-the-data/scripts directory and run:

npm install

npm is the node package manager. The install option tells npm to read the package.json file and install any of the requirements that have not already been installed. If you plan on loading the Giants data I have already parsed out you are good to go. However if you want to start the from the beginning yourself you will need to download the data from the Baseball Databank project. The most up to date branch is the 2012update branch. After cloning the repository read the scripts/README.md and you will then be ready to generate your own Giants data or any other team’s data.

My Two Second MongoDB Introduction

Before I dive into the meat of the post I want to give you a very brief MongoDB intro. We will be storing data in collections, which are analogous with tables. Each collection will contain a document, which you can think of as a row in relational databases. As the NoSQL term implies we will not be using SQL to retrieve the data. Instead we will use javascript.

The Data

As a kid who grew up reading box scores every morning while I ate my breakfast I am very thankfully that there is an open source project out there dedicated to providing the statistics for Major League Baseball. It is distributed using 24 CSV files, each file maps to a MySQL table that was used to generate it. All 24 data files either describe a manager, player or a team. So as we build out the database we will create and load three collections: managers, players, and seasons.

Each collection will house documents that have been designed so that one document will represent one player, one manager or one season. This will make our development of the API much easier. For the most part one call should retrieve everything we need. To give you an idea of what type of data the documents will contain I’ve created a map between the collections and files. Remember each file is a table in the Baseball Databank database. In some cases you’d have to write some pretty complicated joins to get the data. In Mongo, our queries will be straight forward.

Managers Collection (4 tables) => AwardsManagers, Managers, ManagersHalf, Master

The Players Collection (11 tables) => AllStars, Appearances, AwardsPlayers, Batting, BattingPost, Fielding, FieldingPost, Master, Pitching, PitchingPost, Salaries

The Seasons Collection (3 tables) => SeriesPost, Teams, TeamsHalf

If you’d like a description of the tables checkout the Baseball Databank README

The MongoDB Side

For the rest of this post I am going to walk you through an example of a document that is stored in each collection. The description will also have examples of how to retrieve the data from the mongo console app. We will start with the simplest of the collections, the managers.


Managers

Any manager that has managed at least a game for either the New York or San Francisco Giants will have a document in this collection. A managers document has demographic information,, managerial record plus any awards they may have won. Our managerial document example is Rogers Hornsby’s. He managed the New York Giants in 1927.

The first property of the document is the _id property. By default each record would have an _id field that is a randomly generated ObjectID created by mongoDB during the insert. Here's what an ObjectID looks like

ObjectId("528398bb3b06760000000004")
In some cases that may work fine but for this collection I am using the baseball databank managerID value. This allows me to take advantage of the built in unique constraint on the _id index. The documents properties are self explanatory however I would like to discuss the record properties.

The record property is an array of JSON objects. Each entry in the array represents a full or partial season with the Giants. Since Hornsby only managed part of one season for the Giants he only has one item in the array. You can think of the entries in the record array as a row in a record table in a relational database. Using an array allows you to keep all data related to managers in one document making it easy to retrieve all of the manager's data when needed. We will make use of arrays in all of our documents. If the Giants had made it to the playoffs or if Hornsby had won any managerial awards his document would have two more array properties, playoffs and awards.

You may have not caught that last bit but documents in the same collection do not have to have the same properties. In all of our collections the documents will have about 95% of the properties in common. I will show you how we can check for the existence of a property when querying the database in a bit. Stay tuned.

You might be wondering how I retrieved the Hornsby document. Let me walk you through the queries I used.

At a command prompt fire up the mongo client by running:

  mongo giants

This will connect you to a local instance of mongodb and switch you into the giants database. You could run just mongo to connect and the use giants at the mongo prompt to do the same thing. For more options read the mongo shell documentation.

Now that I'm in the giants database I can retrieve the Hornsby document by running:

   db.managers.find( { nameLast: 'Hornsby' }).pretty()

What the query says is in the current database search the managers collection for a document that has a nameLast property equal to 'Hornsby'. The find call will return an array containing all of the documents that match our query. The pretty function formats the results of the find in a more human readable fashion. Try the find
call with and without the pretty and you will see what I mean.

If you look closely at the Hornsby document you will see in the record array his only entry has the inseason property set to 2. This means there was at least two managers for the 1927 Giants. To find out who the other managers there were we can run the following query.

db.managers.find( { 'record.yearID' : 1927} )

This will return two documents one for John McGraw and one for Hornsby. Look closely at the query and your will see that I'm using a 'dot notiatiod' to find the managers for the 1927 season. Like the previous find it will return an array of matching documents. Unlike the previous query this one looks inside of the record array. Each document within the record array will have its yearID property compared to 1927. If it has one entry with that yearID then the document will be returned. Going back to my analogy of each entry in the record array being equal to a row in a relation database table you can almost thing of the dot notation as a join. One thing to keep in mind is whenever you use dot notation you must quote the property like I have done. Failure to do so will cause an error from mongoDB.

Remember I said I would show you how to test for the existence of a property? We will get a count of all the managers who have the playoffs property in their document.

db.managers.find({playoffs: {$exists:true}}).count()

This query simply says, find all the documents in managers that have the playoffs property. I am using $exists which is one of the built in query operators. To see what other operators are available checkout the MongoDB Operators page.

The last bit of work I need to do is to setup the indexes on the managers collection. Since most of the queries I run will be on either the last name or to look for a particular season I will add an index on nameLast and record.yearID. Here’s how to create an index in mongo:

db.managers.ensureIndex({ nameLast:1 })
db.managers.ensureIndex({ ‘record.yearID’:1})

These two calls create two separate indexes on the nameLast and record.yearID properties. Notice that I can use the dot notation when declaring an index also. The 1 indicates that we want the index created using ascending ordering. To create an index that uses the descending order swap out the 1 for a -1. Now our managers collection has three indexes: one on the _id property, one on the nameLast property and one on the record.yearID property. To see what indexes are on a collection you can run:

db.managers.getIndexes();

For more information on ensureIndex and getIndexes visit: http://docs.mongodb.org/manual/reference/method/db.collection.ensureIndex/ http://docs.mongodb.org/manual/reference/method/db.collection.getIndexes/


Players

Just like the managers, every player that has stepped onto the diamond in a New York or San Francisco Giants uniform will have a document in this collection. A player’s document will contain demographics, statistics, and appearances. If the player has been an all-star, won an award or has been inducted into the hall of fame his document will have additional properties. Below is the document for Eddie ‘Hotshot’ Mayo who played for the New York Giants in 1936.

The players document is considerably larger than the managers document is. The reason for that is I chose this design was it allows me to retrieve the Giants history of a player with a single query. Even though I've chosen a player centric design it is still relatively easy to find roster related information. As an example lets say we want to see who else played third for the Giants during the 1936 season. I could run the following query:

The query returns a total of four documents, four full player documents which makes it a little hard to read the names of the players. All I really want to see is the nameLast, nameFirst and the value fiendingStats.G for the players who played third. I can convert the output to only contain the values I've indicated by using a projection. I am also changing the names of the properties to something I find a little nicer to read. Now when I run the query I should have four much easier to read results. The updated query and results are below.

That returns the following:

Now that we have the data in a readable layout I would like to sort the players so that the man who played the most games at third will be listed first. Sorting is as easy as adding the $sort operator. Here's what the query looks like with the sort call added.

Notice that I used the new name that I created in the $project call. The -1 indicates we want to sort the games in a descending fashion. The results of the updated query are below.

If you’ve been paying attention you noticed that I was using a function called aggregate instead of find. The aggregate function allows us to chain commands together. We can use the aggreation pipeline to ‘filter’ our data. It works by passing the results from one task to another as illustrated in the $project and $sort calls. I used $project to rename the fieldingStats.G property to just games. I then used the new name, games, to sort by. Let’s walk through the last query to get a better picture of whats going on.

$unwind

{ $unwind : "$fieldingStats" }, 

What $unwind does is create a new document or each member of an array. That means a copy of the demographics is put together with each entry in the fieldingStats array. So if a player has 10 entries there will be 10 documents with the same demographic information. Each document will have a single entry in the fieldingStats directory. I chose the fieldingStats property to $unwind on because I am only interested in third basemen. Notice that fieldingStats has a $ in front of it. Remember, that means that you want to use the value of the $fieldingStats property in the command. If I executed the query now with only the $unwind call I would receive the following message.

aggregation result exceeds maximum document size (16MB)

The message brings up one thing I haven't mentioned yet and that is all documents must be less than 16MB in size. Remember the unwind creates many new documents. The players collection has 1675 documents in it, if each player has 5 years worth of stats for 3 different positions, you can see how the size of the result set will increase. Thankfully, in my case I'm filtering the results of the $unwind call down so the 16MB limit is not a problem for me. In my three months of working in MongoDB I have yet to have the size limit cause any issues for me.

$match

{ $match : {"fieldingStats.POS": "3B",
             "fieldingStats.yearID" : 1936 }},

The output of the $unwind call are passed as to the $match call as input. $match searches the input documents looking for documents that match the given parameters. In this case anyone who played third base during the 1936 season will be returned. The number of documents have gone from the thousands to four. The four complete players documents are passed the $project operator.

$project

{ $project : { _id : "$_id",
          lastName : "$nameLast",
         firstName : "$nameFirst",
             games : "$fieldingStats.G"}}

I've already gone over what the $project call does so I won't go into it again.

$sort

{ $sort : { games: -1 } } 

Since I have already gone over the $sort call, I won't do it again here.

Indexes

There will be a few more player API calls so I am going to create a few more ensureIndex calls. Here are the players collection indexes.


Seasons

Each season the New York/San Francisco Giants have played in professional baseball is represented by a document in the seasons collection.
Each document in the collection will have the team’s regular and playoffseason records, team statistics, the roster, and list of managers. The document below represents the 2012 season when the Giants won their second World Series championship in 3 years.

To get the 2012 season document I used another select function, findOne. It is similar to find but it only returns a single object. In cases where there are more than one matching document findOne will return the first document found in the ‘natural order’, it will return the first one stored on disk.

The seasons document is similar to the players and managers document in that it has a ‘core’ set of data that pertains the team’s season plus arrays that store information about the players who were on the team that year as well as the managers.

For the seasons collection I will add the indexes below. The API will make use of these indexes as you’ll see in the next post.

Summary

I have taken the data from 18 database tables and stored them in three different collections in my MongoDB database. The new schema will allow us to make the fewest calls to the database when retrieving player, managerial or season related data. Throughout the post I showed you how to run select statements in the mongo client using find, findOne and the aggregation pipeline. I this post helped illustrate some ways that MongoDB can be used to store data in ways that makes using the data easier.

Resources

MongoDB Doc Links

Baseball Sites

73 comments:

  1. Its really useful detail about Mean stack. Thanks for this information. Best Mean Stack Training in Bangalore

    ReplyDelete
  2. Its very valuable information about Mean stack. Thanks for sharing this.
    Best Meanstack Training in Hyderabad

    ReplyDelete
  3. very useful blog to learner so happy to be part in this blog. Thank you
    Measntack training in hyderabad
    Enroll now

    ReplyDelete
  4. Needed to compose you a very little word to thank you yet again regarding the nice suggestions you’ve contributed here.

    Mean Stack Training in Bangalore

    ReplyDelete
  5. The Best of the MEAN Stack Blogs You have Mentioned here.
    MEAN stack Online Training

    ReplyDelete
  6. Thanks For Your valuable posting, it was very informative...
    Node js Development

    ReplyDelete
  7. This is quite educational arrange. It has famous breeding about what I rarity to vouch. Colossal proverb. This trumpet is a famous tone to nab to troths. Congratulations on a career well achieved. This arrange is synchronous s informative impolite festivity to pity. I appreciated what you ok extremely here.
    python Training institute in Pune
    python Training institute in Chennai
    python Training institute in Bangalore

    ReplyDelete

  8. I appreciate that you produced this wonderful article to help us get more knowledge about this topic.
    I know, it is not an easy task to write such a big article in one day, I've tried that and I've failed. But, here you are, trying the big task and finishing it off and getting good comments and ratings. That is one hell of a job done!



    Selenium training in bangalore
    Selenium training in Chennai
    Selenium training in Bangalore
    Selenium training in Pune
    Selenium Online training

    ReplyDelete
  9. I found this informative and interesting blog so i think so its very useful and knowledge able.I would like to thank you for the efforts you have made in writing this article.

    Data Science Training in Indira nagar
    Data Science Training in btm layout
    Data Science Training in Kalyan nagar
    Data Science training in Indira nagar
    Data science training in bangalore

    ReplyDelete
  10. I would like more information about this, because it is very nice...Thanks for sharing. Thanks a lot
    Anika Digital Media
    seo services
    web design development

    ReplyDelete
  11. Thank you so much for sharing this great information with us.
    MEAN Stack Training
    MEAN Stack Online Training

    ReplyDelete
  12. I think this is a great site to post and I have read most of contents and I found it useful for my Career .Thanks for the useful information. Good work.Keep going.
    mobile service center in velacherry
    mobile service center in vadapalani
    mobile service center in porur

    ReplyDelete
  13. Its a wonderful post and very helpful, thanks for all this information.
    Javascript Training institute in Noida

    ReplyDelete
  14. Attend The Python training in bangalore From ExcelR. Practical Python training in bangalore Sessions With Assured Placement Support From Experienced Faculty. ExcelR Offers The Python training in bangalore.
    python training in bangalore

    ReplyDelete
  15. Pada saat menjalankan permainan judi dominoqq online anda diharuskan untuk menentukan pilihan meja
    asikqq
    dewaqq
    sumoqq
    interqq
    pionpoker
    bandar ceme terpercaya
    hobiqq
    paito warna terlengkap
    bocoran sgp

    ReplyDelete
  16. This comment has been removed by the author.

    ReplyDelete
  17. Your info is really amazing with impressive content..Excellent blog with informative concept. Really I feel happy to see this useful blog, Thanks for sharing such a nice blog..
    If you are looking for any python Related information please visit our website Python classes in pune page!


    ReplyDelete
  18. Being new to the blogging world I feel like there is still so much to learn. Your tips helped to clarify a few things for me as well as giving.sap s4 hana simple finance training in bangalore

    ReplyDelete
  19. Linking is very useful thing.you have really helped lots of people who visit blog and provide them use full information.javascript training in bangalore

    ReplyDelete
  20. Excellent Blog! I would like to thank for the efforts you have made in writing this post. I am hoping the same best work from you in the future as well. I wanted to thank you for this websites! Thanks for sharing. Great websites!
    data analytics courses
    data science courses
    business analytics course

    ReplyDelete
  21. Attend The Business Analytics Courses From ExcelR. Practical Business Analytics Courses Sessions With Assured Placement Support From Experienced Faculty. ExcelR Offers The Business Analytics Courses.
    ExcelR Business Analytics Courses
    Data Science Interview Questions

    ReplyDelete
  22. I finally found great post here.I will get back here. I just added your blog to my bookmark sites. thanks.Quality posts is the crucial to invite the visitors to visit the web page, that's what this web page is providing.
    ExcelR data analytics courses
    ExcelR data Science course in Mumbai
    data science interview questions
    business analytics courses

    ReplyDelete
  23. This is a very excellent post..
    Thanks for sharing with us,
    We are again come on your website,
    Thanks and good day,
    If you need any logo then,
    Please visit our site,
    buylogo

    ReplyDelete
  24. Excellent Blog! I would like to thank for the efforts you have made in writing this post. I am hoping the same best work from you in the future as well. I wanted to thank you for this websites! Thanks for sharing. Great websites!
    data science courses in pune

    ReplyDelete
  25. After reading your article I was amazed. I know that you explain it very well. And I hope that other readers will also experience how I feel after reading your article.

    Simple Linear Regression

    ReplyDelete
  26. Great post i must say and thanks for the information.
    Data Science Course in Hyderabad

    ReplyDelete
  27. This post is very simple to read and appreciate without leaving any details out. Great work!
    360digitmg ai training

    ReplyDelete
  28. interesting piece of information, I had come to know about your web-page from my friend, i have read atleast eight posts of yours by now, and let me tell you, your blog gives the best and the most interesting information. This is just the kind of information that i had been looking for, i'm already your rss reader now and i would regularly watch out for the new posts, once again hats off to you! Thanks a million once again, Regards,
    Salesforce Training in Chennai | Certification | Online Course | Salesforce Training in Bangalore | Certification | Online Course | Salesforce Training in Hyderabad | Certification | Online Course | Salesforce Training in Pune | Certification | Online Course | Salesforce Online Training | Salesforce Training

    ReplyDelete
  29. This is quite educational arrange. It has famous breeding about what I rarity to vouch. Colossal proverb. This trumpet is a famous tone to nab to troths. Congratulations on a career well achieved.
    web designing training in chennai

    web designing training in omr

    digital marketing training in chennai

    digital marketing training in omr

    rpa training in chennai

    rpa training in omr

    tally training in chennai

    tally training in omr


    ReplyDelete

  30. Really nice and interesting post. I was looking for this kind of information and enjoyed reading this one. Keep posting. Thanks for sharing
    data scientist course

    ReplyDelete
  31. Really nice and interesting post. I was looking for this kind of information and enjoyed reading this one. Keep posting. Thanks for sharing

    AWS training in Chennai

    AWS Online Training in Chennai

    AWS training in Bangalore

    AWS training in Hyderabad

    AWS training in Coimbatore

    AWS training

    ReplyDelete
  32. Excellent post. I learned a lot from this blog and I suggest my friends to visit your blog to learn new concept about technology.Best data science courses in hyerabad

    ReplyDelete
  33. wonderful article. I would like to thank you for the efforts you had made for writing this awesome article. This article resolved my all queries. data science courses

    ReplyDelete
  34. I curious more interest in some of them hope you will give more information on this topics in your next articles.
    data science training in Hyderabad

    ReplyDelete
  35. Much thanks for composing such an intriguing article on this point. This has truly made me think and I plan to peruse more ExcelR Business Analytics Courses

    ReplyDelete
  36. Great Content. It will useful for knowledge seekers. Keep sharing your knowledge through this kind of article.
    Mean Stack Training in Chennai
    React JS Training in Chennai

    ReplyDelete
  37. AI Patasala is the most rumoured Python Training in Hyderabad developed to help you advance your career with a massive increase in the latest technology.
    Python Course in Hyderabad

    ReplyDelete
  38. Thanks for posting the best information and the blog is very good and the blog is very good.data science training in udaipur

    ReplyDelete
  39. Nice blog and absolutely outstanding. You can do something much better but i still say this perfect.Keep trying for the best. data analytics course in kanpur

    ReplyDelete
  40. I'm really excited to follow your journey into the MEAN stack! Your plan to build a New York/San Francisco Giants stats app sounds engaging and informative. Looking forward to Part 2 and Part 3. Best of luck! 🚀
    Data Analytics Courses in India

    ReplyDelete
  41. It's interesting to observe your progression across the MEAN stack, from discrete components to a coherent app. I'm sure that other students like me will gain from the way you share your experiences. Interested in your next blogs! 🚀
    Data Analytics Courses in India

    ReplyDelete
  42. I can't wait to follow you as you navigate the MEAN stack! I'm fascinated by MongoDB's adaptability and am looking forward to seeing how you convert MySQL data for MongoDB. awaiting the forthcoming API and UI elements with anticipation. I wish you luck! Data Analytics Courses in India

    ReplyDelete
  43. Hello,
    This post provides a comprehensive introduction to setting up MongoDB collections for baseball statistics using the MEAN stack. It covers data modeling, querying, and indexing, making it a valuable resource for anyone learning MongoDB and MEAN stack development.
    Data Analytics Courses in Nashik

    ReplyDelete
  44. Hi,
    This article provides an insightful introduction to using the MEAN stack for building a baseball statistics app. It discusses data modeling and demonstrates querying techniques. Great educational resource indeed.
    Is iim skills fake?

    ReplyDelete
  45. I appreciate your efforts in breaking down complex concepts into easily understandable pieces, making it easier for aspiring developers to grasp the fundamentals of MEAN stack development. Keep up the good work.
    Data Analytics Courses In Chennai

    ReplyDelete
  46. nice blog
    Data Analytics Courses In Vadodara

    ReplyDelete
  47. good blog
    Data Analytics Courses In Vadodara

    ReplyDelete
  48. This 'Intro to the MEAN Stack - Part 1' article provides a solid foundation for understanding the core data components of the MEAN Stack.
    Digital Marketing Courses in Hamburg

    ReplyDelete
  49. Thank you for sharing valuable information, and your blog is excellent.
    data Analytics courses in leeds

    ReplyDelete
  50. Thank you for sharing valuable information in your blog. It's greatly appreciated, and your blog is of excellent quality.
    data Analytics courses in leeds

    ReplyDelete
  51. A well-explained introduction to the significance of data in the MEAN Stack. Looking forward to diving deeper into the components and functionalities in the upcoming parts. Thanks for sharing this informative piece!"
    Digital marketing courses in woking

    ReplyDelete
  52. I found your introduction on MEAN stack's data component very informative and insightful.
    Digital Marketing Courses in Italy

    ReplyDelete
  53. "Great overview of the MEAN Stack! I appreciate the clear explanations on MongoDB, Express.js, Angular, and Node.js. Looking forward to the upcoming parts to delve deeper into each component. How does the MEAN Stack handle data storage and retrieval? Any particular challenges one might face when getting started? Thanks for breaking it down!"
    Best Data analytics courses in India

    ReplyDelete
  54. Thank you for sharing your insightful journey into the MEAN stack! Your detailed post is truly enriching.
    Investment Banking Industry

    ReplyDelete
  55. Fantastic article. Thanks for the really well thought out guide, code and links. Great share.

    Investment banking analyst jobs

    ReplyDelete
  56. Part 1 of the blog series focuses on "The Data," suggesting a detailed exploration of MongoDB, a NoSQL database. Stay tuned for insights into data storage. nice work. keep posting.
    Data analytics framework

    ReplyDelete