Friday, February 3, 2012

Speaking at RubyNation and Moderating Are There Angels Among Us?

I'll be giving a talk at RubyNation in Reston, VA on March 23rd or 24th, tentatively titled "Coding for Uncertainty: How to Design Maintainable, Flexible Code for a Startup Application". I plan to discuss lessons learned from building OtherInbox and subsequent projects, and how I try to hit my maximum sustainable development speed.

I will also be moderating a Think Big Baltimore event called Are There Angels Among Us? on 2/16 here in Baltimore, all about angel investment in the mid-Atlantic region. I have a couple of free tickets to share if anyone reading this blog would like to check it out. Email me if interested.

Hope to see you there!

Thursday, February 2, 2012

The one best way I know of to write software tests

Early in 2011 I had a prophetic conversation with fellow Baltimore hacker Nick Gauthier that radically changed the way I think about testing web applications. He described a methodology where you almost exclusively write high-level acceptance or integration tests that exercise several parts of your code at once (vs. the approach I had previously used, writing unit tests for every component backed up by a few integration tests). For a Ruby app this means using something like Capybara (depicted below), Cucumber, or Selenium to test the entire stack, the way a user would interact with your site.

These tests aren't meant to be exhaustive - you don't test every possible corner case of every codepath. Instead you use them to design and verify the overall behavior of the system. For example, you might write a test to make sure your system can cope with invalid user input:

describe "Adding cues" do
  let(:user) { Factory(:user) }
    before { login_as_user(user) }

    it "handles invalid data" do
      visit "/cues/new"
      select "Generic", from: "Type"
      expect { click_button "Create Cue" }.not_to change(Cue,:count)
      should_load "/cues"
    end
  end
end

Usually with this technique you would not write a separate test for each type of invalid data since these tests like these are fairly expensive. Instead, you combine the test above with a series of unit tests which examine the components involved in the above behavior in an isolated fashion. Typically these tests will run much more quickly because they don't involve the overhead of setting up a request, hitting the database, etc.

In the above example we could cover all of the invalid cases with a model unit test that looks like this:

describe Cue do
  it { should validate_presence_of :name }
  it { should validate_presence_of :zip_code }
end

What you end up with is a small number of integration tests which thoroughly exercise the behavior of your code combined with a small number of extremely granular tests that run quickly and cover the edge cases.

One Criticism of This Approach

This idea has been working wonderfully for me. I feel like it gives me excellent code coverage without creating a massively-long running test suite. But I did notice Nick Evans critiquing this style of testing awhile ago on Twitter:
lots of integration tests + very few unit tests => a system that mostly works, but probably has ill-defined domain models and internal APIs.
The fact that it got retweeted and favorited a number of times makes me think he's onto something, though I haven't run into this problem yet, and I'm rigorous about keeping domain models and APIs clean. I have no problems refactoring in order to keep my average pace of development high. In my experience adhering to a strict behavior-driven development approach has kept me from running into the problem he describes, but that might not hold if I was part of a team. Time will tell.

Tuesday, January 31, 2012

My twelve-factor app development environment (Unicorn and Pound)

I've been hugely influenced by the twelve-factor app manifesto written by Heroku. I really like building apps on Heroku and think it's pretty great for prototyping things, because it totally abstracts the need to think about devops in the early stages of a project. I don't think Heroku is a panacea (I don't have experience running something large on it), but it's great for getting something going quickly.

That manifesto illustrates how you end up building your app if you launch it on Heroku. I've found the  principles greatly increase my productivity no matter what platforms I'm using. One thing that did take me a little while to figure out: most of my apps are SSL-only, and I always want to interact with them over SSL in my development environment because odd things can creep up in production if you never test SSL locally. (Most of the advice that Rails blogs give always point you towards shutting off SSL in development mode which seems crazy to me)

At first I couldn't quite figure out how to make SSL work while complying with factor 7, Port Binding. I was still using Phusion Passenger with SSL configured as I described in this very old article (which still gets a lot of traffic).

Now here's what I do. I run the Pound reverse proxy using this configuration:

# http://lvh.me
ListenHTTP
  Address 127.0.0.1
  Port    80

  Service
    BackEnd
      Address 127.0.0.1
      Port    8080
    End
  End
End

# https://lvh.me
ListenHTTPS
  Address 127.0.0.1
  Port    443
  Cert    "/usr/local/etc/pound.pem"
  AddHeader "X_FORWARDED_PROTO: https"

  Service
    BackEnd
      Address 127.0.0.1
      Port    8080
    End
  End
End

The /usr/local/etc/pound.pem file is a locally-generated, locally-signed SSL certificate.

When I want to view an SSL-only app in my local browser, I just start Unicorn (my app container of choice) which defaults to port 8080. Then I visit https://lvh.me which is helpfully set to the loopback address 127.0.01. That hits Pound, which terminates the SSL connection (after a warning about the self-signed certificate), and proxies the web request to Unicorn.

This is very similar to what happens when the app runs in production on Heroku's systems, except that they use a Procfile in the root directory of the app to configure Unicorn. Per factor 7, it allows Heroku to bind my Unicorn instance to any arbitrary port:

web: bundle exec unicorn -p $PORT -c config/unicorn.rb

Monday, January 30, 2012

Simple Resque lets you send Resque jobs from one codebase to another

I just released a small gem called simple_resque which abstracts a pattern that's become very common in my recent projects. I like using Resque as a job queue to move as much work out of the web application as possible. Unlike the usual Resque setup, I never put the workers in the same codebase as the web app. I like to keep the asynchronous parts of the app completely separate from the code that services web requests.

This required some hacking since Resque expects you to pass a Ruby class constant for the worker, but the webapp doesn't have those classes defined. simple_resque provides a thin wrapper over Resque's push method that mimicks the way Resque.enqueue works, but doesn't require you to use a class constant.

For more details check out: https://github.com/subelsky/simple_resque

Thursday, January 12, 2012

OtherInbox acquired by ReturnPath

I've written on and off about my experiences using Ruby and JavaScript and other technologies to build OtherInbox, the company I cofounded with Josh Baer. Today I'm happy to announce that the company has been acquired by ReturnPath

It was a great ride, and I learned a ton. Congratulations to the whole team! 

I am now starting another round of the entrepreneurial cycle and have started working on something new and very cool. As usual I don't plan to announce details until the business is up and running and ready for new customers. So stay tuned!

Friday, December 23, 2011

Who owns vacant properties in Baltimore?

I received many ideas for my free software project and ultimately settled on one suggested by Kate Bladow: a tool to help identify potential slumlords in Baltimore. It's specifically designed to help Baltimore Slumlord Watch investigations, though that anonymous blogger has nothing to do with this tool (he or she has to do complete investigations of each property before writing a post). This is more like an experiment to use all available data to identify people and companies who may own a large number of vacant properties.

The tool combines data from three sources:
  1. State of Maryland Real Property database: to get a complete list of every property in Baltimore, identified by a block and lot number (this database, unlike #2, allows wildcard searching, which is how we get the complete list). Includes a truncated field listing the owner name.
  2. Baltimore City Real Property database: to find the complete owner name and mailing address.
  3. Baltimore's Vacant Lots and Vacant Buildings open data sets. The anonymous slumlord watch blogger says that these are not very accurate or up-to-date, but hopefully they are good enough for us to identify who the main offenders are.
I applied a few cleanups and transformations to make the data more useful, and used the excellent Google Refine tool to try and reduce the noise I found in the Owner Name column. Many entities were listed under a variety of spellings, punctuations, and abbreviations, which Google Refine helped me combine. Thanks to Mark Headd for recommending Google Refine to me.

Below you will find a few lists of the top property owners in Baltimore gleaned from these tools.

Important Caveats
  1. Some properties are owned by companies using a series of one-up numbered company names (like "N# Inc." or "NB1 Business Trust", "NB2 Business Trust", etc.  I used Google Refine's clustering feature to combine similar names on the assumption that these are probably controlled by the same people. In the cases where I did this kind of grouping, I used sentence case instead of upper case or I replaced digits with the # sign.
  2. Many properties are owned by a uniquely-named LLC (like "1 E. Montgomery LLC"). One person or company could own a significant share of the vacant properties in Baltimore via shell corporations like this. One potential way to get around this is to look up the incorporation paperwork for each company (also available as a scrapeable database), but I'm assuming if you're smart enough to use shell corporations you're probably using a different company to be a registered agent. So this technique would probably only help us identify the main registered agents for the vacant property owners in Baltimore.
  3. I haven't done a great deal of authenticating or verifying. All I'm trying to do is make this data more discoverable/explorable. Obviously you should do your own homework before acting on any of this information.
  4. I was really surprised to see how much property is controlled by the city. Even if the absolute numbers below are inaccurate the relative amount is pretty amazing. I'd like to see the city take some bold leadership on doing something with all of those buildings and lots. How about a revival of the dollar home program?
  5. I only focused on properties listed as non-owner occupied by the State of Maryland.
  6. The Slumlord Watch blogger says that the city's vacant building data is inaccurate and not up-to-date, so there may be false positives and negatives in the list.
Largest Vacant Property Owners in Baltimore, Grouped by Name
Owner#  Vacants
Baltimore City1407
UP# BUSINESS TRUST38
SS# BUSINESS TRUST25
JAMES E. CANN24
NB# Business Trust24
State of Maryland19
2008 DRR-ETS, LLC18
BALTIMORE RETURN FUND, LLC18
EAST BALTIMORE DEVELOPMENT LLC18
COMPOUND YIELD PLAY, LLC17
CE REALTY, LLC. & EPHRAIM WEINGARTEN16
KONA PROPERTIES, LLC16
CE REALTY, LLC15
J.A.M. numbered corporations15
BALTIMORE PREFERRED PROPERTIES LLC14
DRUID HEIGHTS COMMUNITY DEVELOPMENT CORPORATION14
HOLABIRD INVESTMENTS, LLC14
NEW HORIZON DEVELOPMENT, LLC14
DOMINION PROPERTIES LLC13
COMMUNITY SOLUTIONS, LLC12
M&S JOINT VENTURE DEVELOPMENT CORPORATION12
MAHS-BE HOLDINGS, LLC12
BALDWIN TRUSTEE, LEROY11
HARRISON DEVELOPMENT, LLC11
HUD11
CHESAPEAKE HABITAT FOR HUMANITY INC10
KGB numbered corporations10
University of Maryland10
L.A.M.B., INC.9
REBUILD AMERICA, INC9
CARTER, NATHAN8
EQUITY TRUST COMPANY8
KREISLER, SANFORD8
LAMB, DERRICK8
N-#, INC.8
OAKMONT DESIGN LLC8
SANDTOWN HABITAT FOR HUMANITY8
DOMINION RENTALS, LLC7
GREEN, CARL7
HARBOUR PORTFOLIO7
LEO, CAROLINE G.7
N10 BUSINESS TRUST7
NEIGHBORHOOD PROPERTIES-4, INC7
SAUNDERS TERRAINE7
EAST BALTIMORE DEVELOPMENT, INC6
APP CONSULTING GROUP, LLC6
DJ LAND CO, LLC & WODA GROUP LLC6
EMERALD BAY DEVELOPMENT GROUP & ONE, INC.6
FIRST NATIONAL DEVELOPMENT, LLC6
JOHNSON, MARTIN6

You can also download the entire list of non-owner-occupied vacant building owners in Baltimore.

Largest Vacant Lot Owners in Baltimore, Grouped by Name
Baltimore City2926
B&D PHASE III, LLC64
METRO II OLDHAM, LLC & SUNNYS ASSOCIATES, LLC42
CAMDEN ASSOCIATES, LLC.40
HARBORVIEW LIMITED PARTNERSHIP NO. #35
State of Maryland32
LOWMAN ST.,LLC31
Oblate Sisters of Providence27
BG&E23
COMPANY, LLC & FEDERAL HILL HOLDING & SCC CANYON II, LLC23
ATLAS MD I SPE, LLC & BB&T BANK (CREO), ATTN: T. GEORG19
J & J PARTNERSHIP, INC.19
Baptist Church18
SANDTOWN HABITAT FOR HUMANITY18
NANTICOKE INVESTMENT CO., LLC17
L.A.M.B., INC.15
CSX TRANSPORTATION, INC. & TAX DEPARTMENT13
DRUID HEIGHTS COMMUNITY DEVELOPMENT CORPORATION13
SINGER PARK & PLAY, INCORPORATED13
STATION PLACE LLC13
TRIMARK MANAGEMENT13
ASSOCIATION, INC & MCHENRY POINTE HOMEOWNERS12
Benedictine Society of Baltimore12
CHESAPEAKE HABITAT FOR HUMANITY & INC12
JUBB JR, WALTER H & JUBB, EDWARD H12
CASTLEWOOD COMMUNITIES, LLC11
MOUNT SINAI BAPTIST CHURCH & OF BALTIMORE CITY11
MARYLAND JOCKEY CL10
CONVENTION AND AUXILIARIES & OF BALTIMORE, INC. & UNITED BAPTIST MISSIONARY9
DUNN, GREG9
RIVERSIDE WORK FORCE LLC9
BALTIMORE URBAN LEAGUE, & INC.,THE8
C&P TELEPHONE COMP8
CORPORATE SECRETARY, AMTRAK & NATIONAL RAILROAD & PASSENGER CORPORATION8
DEVELOPMENT CORPORATION & DRUID HEIGHTS COMMUNITY & JACQUELYN D CORNISH8
FRP HOLLANDER 95, LLC8
HOLABIRD PARK APTS. INC8
MUELLER HOMES, INC.8
NEWSTAR DEVELOPMENT AT CANTON & PEAKS, LLC8
SCARFIELD SR, FRANK D8
THE KCR DEVELOPMENT GROUP & SPICER'S RUN HOMEOWNER ASSOCIATION
BALTIMORE SCRAP CORP.7
BRIGHTON DEVELOPMENT GROUP & LLC7
CHURCH, THE & VESTRY OF MOUNT CALVARY7
FLAG HOUSE RENTAL I, L.P. & METRO PLAZA II7
FOWLKES, ROBIN7
PARADIGM BUILDERS, LLC & RICHARD MIRSKY - OFFIT KURMAN7
URBAN HEALTH INSTITUTE OF & WASHINGTON, THE7
CANN JAMES E6
CHURCH OF THE REDEEMED OF THE & LORD, INC, THE6

You can also download the entire list of non-owner-occupied vacant lot owners in Baltimore.

The Raw Data
All data used to create the above table can be downloaded from Github, including the raw CSV data.

The Code

It's creative-commons licensed and posted on Github. It's pretty raw and unfactored. I ran it all from irb. It needs to be converted into a Rake task or other command-line friendly, totally-automated package.

Next Steps
  • We could get this up and running on ScraperWiki to have the data constantly updated.
  • We could run an Amazon Mechanical Turk project to create an up-to-the-minute database of vacant houses in Baltimore, using Google Street View. We could just ask each worker to use street view to make an estimate of whether the house was vacant or not. I'm sure there would be some inaccuracy but the data ought to be good enough to help further investigations.

Friday, December 16, 2011

The only social media advice you need

I keep thinking about How To Be Interesting, a 2006 blog post I read a few months ago. Russell Davies captured the essence of my social media strategy, what social media means to me, and why it's been so successful for me and others. If anyone ever asks me for advice in this arena, I am going to quote Davies:
...the core skill of any future creative business person will be 'being interesting'. People will employ and want to work with (and want to be with) interesting people.
Social media is a big deal because it helps you do the two things Davies recommends to cultivate that skill of "being interesting":

The way to be interesting is to be interested. You’ve got to find what’s interesting in everything, you’ve got to be good at noticing things, you’ve got to be good at listening. If you find people (and things) interesting, they’ll find you interesting. 
Interesting people are good at sharing. You can’t be interested in someone who won’t tell you anything. Being good at sharing is not the same as talking and talking and talking. It means you share your ideas, you let people play with them and you’re good at talking about them without having to talk about yourself.
It's not rocket science: social media helps you find more interesting things (such as when I found this article via Hacker News) and share them (like I'm doing with this post).