“What is this you call property?”, asked Massasoit, the leader of the Native American Wampanoag tribe. “It cannot be the earth, for the land is our mother, nourishing all her children, beasts, birds, fish and all men. The woods, the streams, everything on it belongs to everybody and is for the use of all. How can one man say it belongs only to him?”
Good question, Massasoit. Yet, due to a tragic combination of the pathogenic bacteria Leptospira and aggressive colonists the answer became irrelevant and the concept of land ownership proliferated through the majestic lands of the new world like a virus.
Today, the dust has settled and the iron horse has carried the white man to the west coast, where I currently reside. As I explored the wonderful city of Los Angeles I began to wonder to whom, exactly do I owe the pleasure of my environment? Who “owns” the dirt I stand on? So I did some research.
The “United States” is divided into 3,144 counties and county equivalents. Of these, Los Angeles county is the most populous, with over 10 million residents. The least populous, Loving County, Texas has only 82. Funny story, in 2006 a group of Libertarians attempted to buy up land and seize power in Loving County with the goal of establishing their ideals, but were thwarted by the local sheriff. The group is currently featured on a “Wanted” poster in the county’s sole courthouse.
LA County has an area of 4,751 mi2, divided across 88 cities and 2,379,680 parcels. However, much of the land is “unincorporated”, meaning it does not fall within the jurisdiction of an established city. If you would like to establish your own city in LA County you can apply to the LAFCO for as little as $2,5001. The information regarding parcel owner, location, and “assessed value” for collecting property taxes is maintained by the Assessor’s Office2.
Formats and Tools
Most, if not all, counties use GIS (geographic information systems) to maintain this data3. The LA office uses Microsoft Access for ownership and assessed value information, and the popular Shapefile format for geometry and mapping. ESRI (environmental systems research institute), founded in 1969, dominates land-use consulting with their popular ArcGIS software and Shapefile format developed in the early 1990s. A Shapefile consists of several different files, 3 of which are mandatory:
.shp – feature geometry as a set of either WKT (well known text) of WKB (well known binary) coordinates. Each of these entries can be one of several different simple datatypes such as
POINT (30 10)
LINESTRING (30 10, 10 30, 40 40)
POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))
MULTIPOLYGON (((30 20, 45 40, 10 40, 30 20)), ((15 5, 40 10, 10 20, 5 10, 15 5)))
.shx – index of positional geometry to allow quickly stepping forward and backward
.dbf – old school simple database format popular in the 1990s, here stores attributes for each shape
There are several optional files, the most important of which though is
.prj – represents the projection information of the coordinates in the shapes. More on this soon.
I performed extensive cleaning and simplification on the assessor’s office data as part of this analysis, the bulk of which was done with PostgreSQL and the fantastic PostGIS extension4.
If you want to follow along, say with an EC2 instance, first grab some dependencies.
sudo apt-get -y install postgresql postgresql-contrib postgis postgresql-9.3-postgis-2.1-scripts
Now let’s create a database for our geospatial data
createdb gis psql -d gis -c 'create extension postgis'
The earth is not a perfect sphere. We represent it instead as a “geoid”, a mathematical object that, ideally, represents the precise shape of the earth if it were only under the influence of gravitation and rotation. While imperfect, the geoid, combined with satellite data provides a somewhat close approximation to the actual shape of the earth. The geoid works in tandem with different “datum”, which are coordinate systems used by regions to define a coordinate system consistent with the geoid. Today, improvement to the model and coordinate systems has led to the possibility of a single global standard for the globe, WGS84, that is gaining in popularity. Still, datums are typically more precise when defined only for a single region.
The datum used by the LA Assessor’s office is NAD1983. The naming convention originated with the first North American survey in 1901, based on an ellipsoid geoid model developed in 1866. The system was updated in 1927 based on surveys of the entire continent but using the same geoid, and updated again in 1983 using satellite and remote sensing data using GRS 80 as the geoid, the same model originally used by the popular global standard WGS84. If it sounds simple, it is not, but if you’re interested it’s a great reason to learn spherical harmonics.
Just remember a datum is a coordinate system defined on a geoid, which is a model of the earth. Geoids and datums exist for other planets too, like Mars.
I re-projected the assessor’s office data from NAD1983 to WGS84 using QGIS. All projections have a corresponding SRID (spatial reference system identifier). Let’s load the shapefile into a PostGIS table, making sure to tell it the projection WGS84, which has an SRID of 43265. You can download it from me.
wget http://dwur9qzdkvp67.cloudfront.net/la_parcels.tar.xz tar -xvf la_parcels.tar.xz shp2pgsql -I -s 4326 -g geom la_parcels.shp la_parcels | psql -d gis > import.log
Next let’s get an SQL prompt
psql -d gis
And run a simple query. This should improve performance a bit.
Now let’s have some fun. What are the most expensive pieces of land in LA County?
select land_value, owner_name, address_number, street_name from la_parcels order by land_value desc limit 10;
|252,479,578||CATALINA MEDIA DEVELOPMENT II||3000||ALAMEDA AVE|
|183,110,154||MOBIL OIL CORP||3700||190TH ST|
|154,291,320||BH WILSHIRE INTERNATIONAL LLC||9900||WILSHIRE BLVD|
|137,295,210||BP WEST COAST PRODUCTS LLC||1801||SEPULVEDA BLVD|
|136,879,311||WARNER BROS ENTERTAINMENT INC||4000||WARNER BLVD|
|135,354,000||NEXT CENTURY ASSOCIATES LLC||2025||AVENUE OF THE STARS|
|132,214,693||UNIVERSAL STUDIOS LLC||3900||LANKERSHIM BLVD|
|129,077,263||TWENTIETH CENTURY FOX FILM CORP||10201||PICO BLVD|
|126,062,515||UNIVERSAL STUDIOS LLC||3900||LANKERSHIM BLVD|
|122,666,461||TISHMAN SPEYER ARCHSTONE SMITH||3600||BARHAM BLVD|
Entertainment and oil companies dominate here. That is land only though. I wonder which of these has the most expensive “improvement”, or building? Let’s use a nested query.
select improvement_value, owner_name, address_number, street_name from (select land_value, improvement_value, owner_name, address_number, street_name from la_parcels order by land_value desc limit 10) as lands order by improvement_value desc;
20th Century Fox wins with $265 million. Exxon Mobil’s sprawling refinery is assessed at only $19 million. Someone must be trying hard to keep property taxes low. Okay what about the most expensive properties overall? Combining land and building value?
select (land_value+improvement_value) as total_value, owner_name, address_number, street_name from la_parcels order by total_value desc limit 20;
|628,970,661||CHILDREN HOSPITAL OF LOS ANGELES||4550||SUNSET BLVD|
|550,125,487||KAISER FOUNDATION HOSPITALS||9343||IMPERIAL HWY|
|523,472,588||CEDARS SINAI MEDICAL CENTER||8720||ALDEN DR|
|521,206,676||TRS OF THE J PAUL GETTY TRUST||199||CHURCH LANE|
|515,115,546||TRS OF THE J PAUL GETTY TRUST||1200||GETTY CENTER DR|
|511,156,285||TRS OF THE J PAUL GETTY TRUST||0|
|511,156,285||TRS OF THE J PAUL GETTY TRUST||0|
|478,152,504||CENTURY CITY MALL LLC||10250||SANTA MONICA BLVD|
|477,297,278||HAY,DOROTHY L DECD EST OF AND||121||LA CIENEGA BLVD|
|475,058,030||ANHEUSER BUSCH INC||15800||ROSCOE BLVD|
|466,751,222||TRIZEC 333 LA LLC||333||HOPE ST|
|439,548,987||MARANGI,LEONARD M ETAL TRS LESSR||100||CONGRESS ST|
|439,000,000||COMMUNITY REDEVELOPMENT AGENCY||350||GRAND AVE|
|420,500,000||WILSHIRE COURTYARD LP||5700||WILSHIRE BLVD|
|397,617,220||DISNEY,WALT PRODUCTIONS INC||500||BUENA VISTA ST|
|396,907,059||CEDARS SINAI MEDICAL CENTER||127||SAN VICENTE BLVD|
|394,172,461||TWENTIETH CENTURY FOX FILM CORP||10201||PICO BLVD|
|376,000,000||2121 AVENUE OF THE STARS LLC||2121||AVENUE OF THE STARS|
|364,457,522||1999 STARS LLC||1999||AVENUE OF THE STARS|
|361,003,213||UNIVERSAL STUDIOS LLC||3900||LANKERSHIM BLVD|
Hospitals monopolize the top spots. Healthcare is expensive. No surprise to see the magnificent Getty Center either. I wonder if the multiple entries are redundant or it’s worth $2 billion. Wouldn’t be surprised either way. Did you know it’s free? Free! Unlike the hospital.
Let’s find another landmark. How about Dodger Stadium? We’ll use a forgiving string compare to make sure we match the street.
select (land_value+improvement_value) as total_value, owner_name, address_number, street_name from la_parcels where address_number = 1000 and street_name ilike 'elysian park%'
|84,409,139||“REALCO INTERMEDIARY LLC”||1000||“ELYSIAN PARK AVE”|
Dodgers stadium must be worth more than $84 million. How do the assessed values compare to real world values? Let’s use One Wilshire as an example. It sold in 2013 for $437.5 million and its assessed value is $297.5 million. Not too far off.
Now let’s use the aggregation function sum() and group by to find the most expensive cities by area in LA County. Since the city column is still a bit messy we’ll use having to eliminate the outliers.
select city, price_per_area from (select sum(area) as area, sum(land_value) as land_value, count(ain) as parcels, city, (sum(land_value) / sum(area)) as price_per_area from la_parcels group by city having count(ain) > 1000 order by price_per_area desc) as cities
|PALOS VERDES EST||47.59576314||4671096274|
|LA CANADA FLT||31.25619651||3794539133|
I still want an answer to my original question. Who owns the most land?
select owner_name, count(ain), sum(land_value)+sum(improvement_value) as holding, sum(area) as lands from la_parcels group by owner_name order by lands desc limit 10
|U S GOVT||2695||406477416||33837253322|
|STATE OF CALIF||1387||271712949||2551166380|
|L A CITY||5498||1123805464||1586934789|
|SANTA CATALINA ISLAND||79||22222591||1504028479|
|L A COUNTY||1768||603224059||870795789|
|TEJON RANCH CO||59||5411765||703066562|
|L A CITY DEPT OF WATER AND POWER||2340||212220963||627091955|
|NEWHALL LAND AND FARMING CO||415||199329086||618742553|
|MOUNTAINS RECREATION AND||622||70849023||487140373|
|L A CO FLOOD CONTROL DIST||3916||54042362||448175685|
By area huge swaths of the county is controlled by the federal and state government, with a few agriculture companies such as Tejon Ranch and Newhall Land and Farming Company sprinkled in.
More Advanced Queries
I wonder what percentage of LA County is not held by one of the 10 entities above or is unincorporated? Here we will use a view, which allows you to treat a query like its own table.
create view top_owners as select owner_name from (select owner_name, count(ain), sum(land_value)+sum(improvement_value) as holding, sum(area) as lands from la_parcels group by owner_name order by lands desc limit 10) as owners
Now in order to select the lands these guys own we’ll use a join statement, specifying where the view and table intersect. Join by default is inner, meaning we’ll only get rows where the field matches. Now we can succinctly find the parcels owned by these entities.
select count(*) from la_parcels join top_owners on la_parcels.owner_name = top_owners.owner_name
And finally use a union operation, which can combine multiple select statements into a single column.
select sum(area) from la_parcels join top_owners on la_parcels.owner_name = top_owners.owner_name union select sum(area) from la_parcels
About 30%. And finally what % of the land in the city of LA is devoted to public space? I think this is an important metric for any city.
select sum(area) from la_parcels where owner_name ilike 'l a city' or owner_name ilike 'l a city park' union select sum(area) from la_parcels where city ilike 'los angeles'
8%. Not bad. Griffith park, Elysian park, MacArthur park, Runyon Canyon, Grand park, Vista Hermosa, LA has some fantastic public spaces. The largest green areas are owned by the federal or state government and are outside the city, though not terribly far. Due to lack of Zoning Code standardization it is difficult to get a good picture of what the lands are used for.
At last, we unleash PostGIS. Note that since we are using WGS84 our results will be in latitude and longitude rather than meters as above.
First it’s good to know what we’re working with.
select distinct GeometryType(geom) from la_parcels
The geom field is exclusively MULTIPOLYGON. If we want to work with simpler shapes we can unroll them with ST_Dumps() in to the POLYGON type.
Alright I wonder where is the geographic center of LA County? We’ll use ST_Extent() to roll up all of our geometries in to a bounding box, and find the X and Y coordinates of its center with ST_Centroid().
select ST_Y(ST_Centroid(ST_Extent(geom))), ST_X(ST_Centroid(ST_Extent(geom))) from la_parcels
This result is from a simple box around our shapes. That is not very rigorous. Instead we should roll up all of our shapes together and form a “convex hull”, the minimum geometry that encloses them, and find the centroid of that. Let’s find how far the center of Malibu is from that point. Here we use a geometry constructor. the true in the ST_Distance() function gets us the distance across the geoid rather than straight through.
select ST_Distance(malibu_center, ST_MakePoint(-118.29553231211, 33.80924996626, 1)) from (select ST_Centroid(ST_ConvexHull(ST_Collect(geom))) as malibu_center from la_parcels where city ilike 'malibu') as malibu
0.505358312522112 is our answer. That’s quite a drive. Especially in traffic.
select distinct ain, land_value + improvement_value as total_value, land_value, improvement_value, owner_name, year_built, address_number, street_name, city, state, zip_code, zoning_code, area, perimeter, ST_AsGeoJSON(ST_MakeValid(geom)), ST_AsGeoJSON(ST_Centroid(ST_MakeValid(geom))) from la_parcels where geom @ ST_MakeEnvelope(%s,%s,%s,%s)
The bounding box of the map is passed to the %s parameters. A few of the geometries are invalid so ST_MakeValid() helps us out there. The query is amazingly fast, around 12ms. Drawing on the map is the slow part. Google Maps has gotten too complicated. But it still works pretty well as long as you don’t zoom out too far and avoid residential areas. The american dream of individual home ownership is slowing down my app. Hopefully I can speed it up but for now you can play around with it here. Click on the geometry to see the value. Some buildings are broken up in to many individual parcels in three dimensions. That is the difference between an apartment and a condominium. In a condo, you own the parcel from the county.
I was disappointed with how difficult it was to obtain this data initially and the poor quality it came in. Governments and constituencies of all sizes stand to benefit enormously from investment in modern software tools and stronger commitments to transparency.
While building I began to dream of having all the parcels of the United States in a single database. That would be a fascinating study, but the data is horribly spread about and fragmented. If you are interested in obtaining the data for your county, or another, and structuring it in to the same schema I would be very grateful, and promise to share the collected information. You can track the completeness of what has been gathered for California here. Please contact me if interested in contributing.
What if we had the data for other nations too? Could we put the entire world in a computer?
 to the pedants that still insist on using “these data” give it a rest, it’s confusing to most people
Do you love databases? Soylent is hiring a Chief Database Architect.