Skip to content

Damn Dirty Data

No, this post isn’t Star Trek related. It’s about SCIENCE!

I realize that the real world makes a lot of things difficult for those who collect data. Consider this, then, a gripe list of things I may or may not expect to get fixed.

Moving on, I work on computer models to better understand physical processes. These models are built and parameterized off of real world data of some way or another. This means I consume a lot of data in my job. However, how I as a modeler would like data is not often how I get the data. Here’s some of the bigger issues:

  • Oddball formats. On a given day, I have to deal with files in Excel format, varieties of plain text, oddball Fortran binary files (undocumented, of course), NetCDF, HDF, ESRI Shapefiles, GeoTIFF, and a whole bunch of other raster formats. And this doesn’t particularly phase me, either. What phases me? Incomplete oddball formats. Whenever a new and interesting bit of geodata ends up on my desk in an oddball format, I can wind up spending over a day poking at it. When I get gridded data, I need to know a few details about each datapoint in the grid:
    1. The lat/lon coordinates
    2. Units being used
    3. The digital representation of the data

    Most data I get tends to follow this: well-put-together NetCDF & HDF files, as well as anything produced by or with a GIS (such as GeoTIFF, Shapefiles, etc.), tend to be alright. I can work with this. However, if I have to figure things out, it slows me down. Sometimes a lot if I misinterpret it- if it looks reasonable to my eye but actually is not, this Can Cause Baddness. This is largely the case with Fortran binary files, but I get a lot of plain text that also shares these problems- I have an unclear idea of where or what is being represented. Examples:

    1. Points are addressed from 0 to 360 degrees east instead of -180 to 180 (or vice versa)
    2. There may be a README file with the line “value is thousands / 20″ or something similarly vague.
  • Oddball Measurements. Let’s start with something basic. If a value is between zero and one, then why do your measurements include numbers both less than zero and greater than one? Saying “sometimes in real life, things work out like that” doesn’t satisfy me in building a numerical model where the textbook definition of a function expects a value to be between zero and one. Tell me why this is, with perhaps a textbook reference going more in depth. I don’t mind that your measurements falls outside the traditionally accepted bounds of reality . I mind that there’s no systematic explanation why this is.
  • Oddball time-series. In modeling, we want to see a lot of things over time. The more fine-grained our models become, the more fine-grained we want our measurement data. If I’m simulating the growth of a crop, I would like to see the growth of the crop as well as after it’s done growing. It’s hard to determine a trend if one only has one to two datapoints to work from.

I suppose the grand summary of this griping is that as a modeler, I have my hands full with a lot of things. Spending weeks trying to work with bad data is frustrating; human hands entered the data into a computer, so why can’t those same hands explain the data?

Fracking GIS.

I have never, in my 20 year history of computing, ever come across a bit of software as remarkably terrible as ArcGIS. Ever.

That is all.

Progress bars

Welcome to the fun and glamorous side of computers!
- my Dad, whenever I was frustrated with progress bars as a child

ESRI, makers of ArcMap: Why, oh why, can’t you at least multi-thread ArcMap. Or make it 64-bit. Or *something*. I’m severely not a fan of ArcMap taking 5,000 years to complete a spatial join. Yes, I know I’m doing this on a 1.83 GHz Mac Mini. But it has two cores, and it’s 64-bit. And it runs MATLAB very nicely, thankyouverymuch, so it’s not impotent by any means.

I’m going to grow old and die before I have the 2000-2005 temperate forest cover change mapped into a 0.5° grid. Gaah.

Ode to C8H10N4O2

O! Brackish water, full of fury!
Warms my brain, on mornings cold.
Gives force to thought when mine eyes are dreary,
Lording o’er beverages, with taste so bold.

Contained by glass after falling down,
Made from earth, so finely ground.
With cream makes the color pleasant brown,
Without which my face becomed frown’d.

I’m a Mac. And I’m a PC.

I’ve been using Windows 7 a lot lately.  In fact, I’ve taken to bringing my dual boot MacBook from home to work every day as I find myself hunched over the small thing most of the time doing work.  It has a smaller keyboard, wireless network (as opposed to 1 Gbps ethernet), and a 13” screen as opposed to my desktop’s huge 24” Dell.

I’m a proud Mac fanboy.  When it works, it works well.  Things have a very high value of JustWork, and more or less I’ve been very pleased with it.

My problem comes into being with trying to interact a lot with more of the outside world.  Office 2008 for the Mac manages to snatch Defeat from the Jaws of Victory.  It’s slow and buggy.  The PDFs that Apple products create (such as from Pages) tend to be incorrect when viewed from Adobe products.  For better or worse, ESRI’s ArcGIS consumes enough resources that it cannot be run in a virtual machine on my mac.

All of these issues were still “put-up-able” enough while the choice was XP or Vista versus MacOS.  I like how Apple has set up the user interface.  That plus a couple of free programs (like Blacktree’s excellent Quicksilver launcher application) mean that I can really focus on doing things without having to think about how I’m doing it.

Now it seems that Microsoft has managed to figure out most of what’s wrong with using Windows with Windows 7.  The interface for most everything I’m using is regular and normal. Its search functionality is as good as Apple’s Spotlight (which admittedly was a Vista feature), and all in all it has a very high value of JustWork.  It moves the major annoyances of computer use from the OS (to which MacOS was the cure), to the applications I’m using.

I’m using Windows 7 right now.  And I like it.

Reflections

I haven’t updated in a while. Originally, I was going to say that the
guest house internet rather sucks, but then I could blog like I did
from the plane- just blog into a text file, and then copy and paste it
like a good boy when I get to internet (in the office).

Honestly? I’m lazy. I’ve been working hard, and generally the time
where I think abstractly enough for blogging tends to be at awkward
times, like in an auto or while walking across campus. However, I
need to write this down. Not only will my Aunt Donna be upset if I
don’t update, but this is for me as well; life here has
changed the way I see the world, really for the better. The result is
that as I blogged like crazy at the start of my trip, so too will I
blog at the end, starting here.

Since my last posting, there’s been a world of things I’ve done. I
can drink tap water now, for starters. I don’t make a habit of it,
but if it’s all that’s available, it’s all that’s available. That’s a
big one, honestly.

A friend of mine and I were talking today, and I brought up that in
coming to India, I steeled myself for The Big Things. I haven’t
really felt any major culture shock, and that is due in large
part for me mentally preparing myself for going from Champaign, IL to
Bangalore, India. However, there’s a million and one small things,
the type that doesn’t rear its head and make itself noticed. For
example, there is something off about India for the first
couple of weeks I was here. I couldn’t tell what it was, but
something felt wrong on a basic, fundamental level.

“What was it?”, asks the audience.

Stairs. Holes. Ledges, bumps, all manner of things that would get
any business or government in the US in large amounts of trouble with
regards to handicap accessibility. India is not the
place for the wheelchair-bound. As a walking person, it’s something
you don’t notice, but for us Americans, our world is typically flat.
Sidewalks are smooth, there are generally smooth(ish) inclines
anywhere you need to get to with minor elevation changes, and public
places either have complex ramps or an elevator for large elevation
changes. It’s really a small thing for me, but it was bugging the
hell out of me until I figured it out. It’s one of a million things
that have struck me about this city.

This city is very much organic; I don’t see any reasonable way that
anyone regulates, taxes, or does much of anything with at least 75% of
the individuals and businesses here. Better question: I don’t know if
there’s any authority that knows that 75% of the individuals and
businesses here even exist. Sure, they know there’s a lot of
people here and it’s a big city, but beyond that, what then? After
much of the touristing I’ve done here over the weekend (more on that
later), I imagine the proper question is: do I know the identity of
every cell in my body?

Big problem, and a wishing for right now…

If I could have a *single* set of things right now…

…I’d have a miniature coffee maker and some grounds right here right now.

If wishes were horses…

…I’d have a lot of butterflies. Or something like that.

I’ve discovered why you don’t drink the tap water in India. Last Friday, I got back to the guest house rather late, so the night shift was there. They’re not quite as on the ball as the day shift. I asked for a bottle of water, as my Friday Night Adventures had left me quite parched. Normally, what they do is they go in the kitchen, and run the filter into a recycled 1 liter bottle. This time, he went back, and came out with a bottle of water, like what normally happens. It was warmer than what usually comes from the filter, but it was water, so I didn’t complain. I got back to my room and immediately chugged at least a quarter of that water…

…and noticed an odd smell about the water. Much like the odd smell of the water that comes from any given tap around here. Crap.

Over the weekend, I was sick enough to make it quite the effort to go out and eat at the guest house dining service, but I was well enough so that even after catching up on some work and some movies I brought with, I was going stir-crazy. I suppose it could have been worse; I’ve been slowly working my way up on street-vendor food here. The food at the guest house and the nicer restaurants I’ve eaten at are all pretty clean; if I haven’t been having the street vendor food, I most likely would have been completely unprepared for the tap water. That would have been Very Bad.

Other than that, I rather like it here. I believe myself to be the only white American here; I’ve made friends with an Englishman, a Frenchman, several Indians, an Argentinean, a Sri-Lankan Australian, an Israeli, and I’m sure I’m forgetting a few. It’s exciting to feel like I’m very clearly the dumbest one here; dinner conversation has gone from advanced pure mathematics, to the economy, to climate change, to applications of nano-scale fuel cells (and the design problems going into them), to effects of climate change on India’s hydrological cycle, and, as usual, lots of hypothesis of how various factors (climate change, phase of the moon, etc) effect the results of cricket matches. “Your favorite team sucking” is very clearly out of bounds here. I feel safer in India than I do back home, but I think an Indian would rather you disparage his/her mother instead of their national cricket team.

And *that* is the State of Matt in India.

Researchdog Thousandaire

India is an interesting country. If I were from an earlier time, I think “queer” would be a great word for it. The woman who tends the garden outside my window makes about 70 Rs./day. I’ve so far averaged to be spending about 100-120 Rs./day, and I think I’m spending a lot until I notice a wallet stuffed to the gills with 500 and 1000 Rs. notes. (Note to the reader: 1 rupee (Rs) is about 2 cents). The place I’m staying at would be likely costing me 500-1000 Rs/night, or more if I were to be paying all my own way. It makes me feel funny to think about not just that I’m making more than someone, but just how crazy well-off I am.

Alex told me to “live like a pimp.” I probably will… once I’m not in neighborhoods where I’m stepping over cow dung in the street while children play on a pile of trash piled up so high against their house they can roll a wheel up to the roof. On the flip side, I can get a samosa and an orange Fanta for 26 rupees. If you’re paying attention to the titles of my previous posts, the Fanta is 20 Rs., thus leaving a very tasty samosa for 6 Rs. (12 cents, for those keeping score). Final verdict: In the slums, I’m going to take advantage of my relative wealth while not being an ass about it. They need the rupees more than I would like a Fanta, so everyone wins there. Once I get out to the better off bar district, I can do as Alex requests :)

Leaving by the main, south entrance to the institute takes me out onto a busy street lined with barbed-wire concrete walls for miles. However, this morning I discovered The Back Entrance, to the north. Walking out there, I managed to discover the fore mentioned street poo and the children with the trash heap. I also discovered a busy many blocks of little shops, including the one I got my Fanta and samosa from. There’s also a few Airtel and Vodafone shops there I’ll probably get a cheap (1200 Rs) pay-as-I-go phone to use around town.

The IIS campus (a.k.a. “The Tata Institute” to local cabbies) is almost something out of Star Trek: The Next Generation. The crew beams down to a planet, where everyone speaks decent-enough to flawless English, and everything’s the same as Back Home, But Different. Where they bother with landscaping, it’s lovely. Where they don’t, it’s native jungle that hasn’t been cleared away (their “quad” is a thick forest with some walking paths through it). All the women here are wearing saris. All the women. Sometimes it feels like a restored university in the Fallout universe- almost everything’s rusty, and there are Random Concrete Things just off the beaten path through the forest that suggest there might have been a building or something large there. There are other walking paths that trail off into nothing, with a set of broken-down stone stairs set into a small hillside… simply in the middle of nowhere. It’s as if the institute simply picked its battles with time and budget… and Time here is not one to be trifled with lightly.

20 cent Fantas

Abject poverty, sheer highway terror, and camels.

I’ve been outside of the airport for an hour now, and it’s
been amazingly interesting already. To begin, I found the
cab arranged for me from the airport. It’s some little 4-door
hatchback Tata, proudly proclaiming it’s V-2 engine on the side
fender. This is where I learned How Indians Drive. Did you know that
two of these size cars will fit side-by-side in the same lane? Or
that you can more or less get two auto-rickshaws and my cab
in the same lane? Or that you only need approximately one meter of
clearance between two trucks carrying OMG HUGE SLABS OF STONE to cut
between them at 100 kph? Upon learning this, I tried finding my
seatbelt. While the belt is there, it was lacking a buckle. Figures.

Anyway, once we got up to speed, traffic is a bit more calm and
orderly; you’re only sharing a lane with one other car at a time.
This gave me the chance to look around. As we were leaving the
airport, everything was under construction. Not in the Chicago
“everything’s under construction Just Because”, but
because everything is being created where there once was
nothing.
Vast expanses of red clay, palm trees, and half-built
things with their attendant cranes and other construction gear. As we
got into the city, things stopped being built, and more like some
force had given up on tearing it down. I honestly don’t know how to
describe it, and something tells me I haven’t seen the worst by a long
shot.

It’s now 10:55 am, March 26th, 2009 (Indian Standard Time), and I
just took my shoes off. The last time I took them off, it was around
3pm on the 23rd at O’Hare Airport. I was tempted on trying an
international call to my parents, but I just checked my world clock,
and it’s past midnight where they are. Weird. It
doesn’t feel like I’ve just come completely around to the far
side of the world. I’d try getting online about now, but the power
died to the building just as I was about to connect. Welcome to the
3rd World!