Vocalyze it

Subscribe to this Blog!

Your email:

Browse by Tag

Sonian's Email & Data Archiving Blog:

Current Articles | RSS Feed RSS Feed

Sonian Summer Codefest 2011: Abundant Innovation - Part 1

  
  
  
  
  
  

The first quarterly all-engineering Codefest completed Tuesday evening (Aug. 16th) with three winning teams, one dramatic performance, and many laughs.

codefest 2 resized 600

The entire company was invited to view the presentations and vote for their favorites. The only voting rule was you couldn’t vote for your own team. The judging was based on three criteria:

1. Impact on solving a Sonian or customer pain point (50%)

2. “Cool-ness” factor (25%)

3. Presentation style and effectiveness to convey the idea (25%).

Thirteen teams competed, representing the four functional units in the Sonian Engineering organization; SAFE (back-end), Website (front-end), DevOps (systems management) and QA. There were several teams from each group.  The themes ranged from automation, performance measurement, to UI beautification and speed. Each team gravitated toward their “natural” inclinations.

The DevOps teams focused on automating manual tasks and removing friction from deployments. The SAFE team (back-end) showcased applying “math” to measuring performance and data classification. The website team looked at speed and a better user experience, and the QA team showed us new ways to think about cost-testing alongside bug testing.

Six teams had a metrics or analytics theme. Two teams focused on user interface improvements, and four teams came up with solutions for automation and deployment problems.

Instead of Ernst and Young tallying the votes, our Harvard MBA trained ROI analyst, Chris H., stepped in to ensure a fair and accurate counting.

And thanks to all the non-technical folks who sat patiently through presentations where terms like “latency,” “lazy loading,” “grepping logs” and “foreground queues” were discussed.

Teams chose their presentation order with the QA team volunteering first. Below is a summary of the first six presentations with some context on how the idea fits into Sonian's needs and long-term vision. There will be another blog soon to follow with the remaining seven presentations.

Team 1: “You paid what for that…Export job, Object list request, or ES cluster?”

Andrea, Gopal, Bryan and Jitesh from the quality assurance team got together around an idea to extend testing methodologies into infrastructure cost analysis. In order to maximize the cloud’s economic advantage, the engineering team is always thinking about the cost of software operating on “big data scale” levels of activity. From architecture to implementation, the goal is to infuse “cost conscious” at every level. The QA team came up with a novel idea on this theme. 

The proposed idea is to extend the testing framework to set a baseline of feature infrastructure costs, and then measure successive releases against the baseline. A significant cost deviation from the baseline could be considered a design flaw, implementation error or a SEV1 bug. Some sample features with measurable costs would be an import job, export request, or a re-index. Over time the entire app suite could have an expense profile established.

Having QA be an additional “cost analysis layer” in the full development cycle will only help make the Sonian software as efficient as possible.

**Bonus points to this team for the most elaborate props and “dramatic performance.”

Team 2: Visualizing Beautiful Insights with Flowing Data

David, Drew, Greg and data analytic consultants, Luke and Sean, demonstrated several prototypes on how to visualize unstructured data. The team shared a common goal around the idea to show customers what’s possible by looking at their data from a visual analytics perspective (i.e., pie charts, heat maps, tag clouds, social graphs, etc.)

David and Drew exported email header information (including attachments) from the index and formatted the output (JSON to CSV) so that Luke and Sean could import a CSV file into their visual framework. Greg created a simulated corporate directory and grouped the email addresses.

The resulting demo used real-world data to showcase communication patterns, frequent “sender/recipient” pairings, a social graph based on who talks to who and attachment file type relative to company organizational unit affiliation.

Visualizing data is relevant to Sonian as we start planning more analytics on the information in the archive and the desire to show customers “actionable intelligence” on their dark data.

Team 3: Team Hobo

Joe W. and Efren D. from the website team demonstrated the concept of using Vagrant (hence the team name?) on laptops to support running most of the application locally. In cloud development, one of the holy grails of developer efficiency is to be able to develop the website application either locally or on a cloud node. Vagrant allows individuals and teams to configure their developer environment more efficiently, whether running on EC2, Rackspace or local. Getting Vagrant to work correctly, consistently and bulletproof across all platforms is challenging.

This team demonstrated Vagrant working on Macbooks and the website and dev tools running completely from the virtual environment, starting from “bare metal.”

Team 4: Awesome / Doppler

Bill, Thomas and Dan S. from Tier 2 Support showed us how they are innovating their tools to help make case resolution faster.

Tier 2 support steps in to solve the challenging support requests that Tier 1 support folks hand-off. Tier 2 is the “glue” between engineering and the help desk, triaging new cases and researching vexing problems. Many Tier 2 requests involve locating the status of individual items in the archive, and this effort can be time consuming given the distributed nature of cloud computing and the pipeline process that manages customer data.

Team Awesome / Doppler created a framework to search across the many log files spread across the compute infrastructure that power the archive system.

Using a combination of bash scripts, Chef commands and their knowledge of the archive file system, the team created a menu-driven search capability that could zoom into a specific part of the system and search for log file statements based on customer ID, distribution partner, or other artifact.  

Team 5: Pointy Haired Bosses (First Place Winner)

The results were quite impressive, and the team demonstrated the Viewer with a new data import capability and a new report showing a few sample accounts where the reported mailbox counts were lower than the calculated counts. In typical Codefest “right to the finish line,” the data was hot off the press.

Historically, keeping customers on an honest self-reported mailbox count was handled in the theme “trust but verify,” with the emphasis on trust. Now with this new feature the verification part can occur. 

The benefits to Sonian are numerous. Accurate billing data tightens the revenue story, as well as giving us a benchmark to calculate true ratios for infrastructure expense to billable subscriber seats. What’s next: Fine-tune the mailbox count algorithm and then run the initial data gathering task across all index clusters to populate the database. Igor will propose a plan, and he and Joe K. will monitor the process.

 

Team 6: Next Gen Metrics

Two DevOps engineers, Sean P. and Justin K., solved a challenging systems administration problem.

Recording and analyzing system events and performance metrics is a “must-have” for distributed systems. The dynamic nature of cloud computing magnifies this problem since good metrics are needed for optimization, and receiving good metrics in a cloud environment requires better tools than the current state of the art. 

Sean and Justin proved elastic search could be a very useful storage repository for metric data gathered by the new cloud-scale monitoring project. The monitoring system collects detailed metric data from Sonian server processes (SAFE, Elastic Search, Website) and stores the data as JSON documents in an Elastic Search cluster. JSON is a lightweight data format and Elastic Search has native JSON capabilities.  Elastic Search queries support data ranges and identifying facets in the data stream.


blog comments powered by Disqus