Pofftastic: Can I fit you into a pretty equation?

Tuesday, February 22, 2011

Can I fit you into a pretty equation?

Discrete Choice Analysis.

It is currently all the rage in human behavioral models. Using it, we researchers hope to fit people's discrete choices (discrete means separate or distinct, as opposed to continuous) into nice equations. The equations are called utility functions. Each distinct choice (e.g., blue pen vs. red pen vs. pencil) is associated with a utility function that is determined by various explanatory variables that have been shown to be significant to your choice set (e.g., gender, income, education level, price of blue pen, stated preference toward the color blue on a scale from 1-5). Then a probability that you will choose the blue pen over the other options is computed.

The first assumption of Discrete Choice Analysis is that people will act rationally and will choose the item with the highest probability. So, there may be a problem there, huh? Given discrete choice analysis I would assume that most people would stay home on snowy days... but perhaps their utility functions are vastly different than mine. . .

BUT, Somewhere along the way I am put into a class of people with the same explanatory variable values as I have (someone with the same income, education, gender, household structure, occupation, etc) who has answered a survey at some point in time and indicated they prefer a blue to red and further have bought 20 blue pens this year and only 2 red. Statistically I should be more likely to purchase a blue pen than a red pen??

Ah, well, it's close enough most of the time and afterall we do have predict driver decisions. (Believe me your transportation models have gotten much better as a result of discrete choice analysis). It is used to predict mode splits (car vs. bus vs. carpool vs. train) and travel routes and has afforded opportunities to do analysis on possible changes to the system (new roads, new lanes, new buses, fare increases) and policy decisions (gas tax, congestion pricing, toll roads, etc). It's predecessor was the gravity model (and yes it's a Newtonian equation), but that may be for another post.

If you're interested in seeing a plan in the works, take a look at http://www.slideshare.net/nashvillempo/nashville-mpo-2035-plan-policy-initiatives
It is Nashville's long range plan (Transportation and Land Use) put on every college students' favorite medium: slideshow!

So you tell me: Can we fit you in to a pretty equation and predict what choices you will make? If you had the data available what choices would you like to build models for?

I'll leave you with a picture of my personal favorite form of transportation:

You were expecting a bicycle weren't you? hehe.

6 comments:

AnonymousFebruary 22, 2011 at 9:57 PM
Do your models have the capability to incorporate new information when it becomes available (ie learn), or do you have to rebuild the model ... or do you ever update it? How does what you do relate to Bayesian networks? .... and Ill stop now lol, Im making a conscious effort to stop typing ... :P
catbus ftw! ^.^
ReplyDelete
Replies
Maritza Barrera, LMFTFebruary 23, 2011 at 4:16 AM
As a now profession (oh yeah, I had to throw that in...tee hee) in the field, I can honestly say that human behavior, and the human psyche in general, is the most simplistic, yet complex system we choose to study. This has been demonstrated time and again by randomized studies and their statistical significance (that are often yield marginal results).
Just my opinion of of course.
Keep up the awesome blog!
ReplyDelete
Replies
PoffFebruary 23, 2011 at 11:58 AM
@Angry Hamster: currently in practice Bayes networks are not used in conjunction with Discrete Choice for estimating driver choice. I think it would be a neat addition and some work is being done in the field. I believe one source I came across though feared that using a Bayes approach might trap a user into using a certain route (model wise) since it operates on the one generation of memory rule. Maybe just using it to update the coefficients of the explanatory variables would be a better approach. I haven't read too much into it yet, although it's funny that you asked because I've been meaning to and spouting off about it to my office mates for the last month.

I did find a cool source about using Markov Chain Monte Carlo Algorithms in order to choose from a colossal choice set without having to enumerate all of the utilities and probabilities associated with each potential choice. Dealing with large choice sets is becoming much more of an issue in Transportation due to the desire to model choices more realistically. Now instead of just computing a route choice or mode choice, they want to compute activity travel patterns which include sets of activities (and their type, location, and duration) and sets of trips (and possible modes and routes). The mathematical modeling is starting to come much closer to mirroring the actual human decision rather than vaguely modeling it. For example you may be deciding whether to take the bus or your car to campus. Current methods in practice take your socioeconomic aspects and bus aspects into account. Cutting edge research takes all the activities you plan to do that day into account as well since you may not take the bus due to a grocery shopping trip planned on your way home from campus.
ReplyDelete
Replies
AnonymousFebruary 23, 2011 at 6:12 PM
@Poff: i read your post and i immediately thought "bayesian networks" (then i had to google discrete choice ;)).
ability to easily add new data as it becomes available and "retrain" your system would be useful. traffic isnt guaranteed to be static (^.^). if you were able to recycle your models, it could save both time and money. for example, if some major construction (a new mall or something) affects traffic in a part of city you already worked on, you would have a model and would just need new data.
i am not sure how useful it is to include a large amount of details into your models. you still need realistic data (that you need to collect somehow) to base your model on. how are you going to (from a practical point of view) gather enough data on people planning grocery trips or feeling good and wanting to take a bus to campus?
ah research! :D just figuring that part out would be a pretty cool project ;)
ReplyDelete
Replies
PoffFebruary 24, 2011 at 7:34 AM
Very good questions. I will be researching Bayes for human behavior modeling today. Also researching fuzzy control logic, so if you have any insight into that it would be much appreciated.

As far as gathering research, my best answer would be from survey data, of course it would all be modeled using answers to stated preference questions and not observed/actual choices since the behavior I'm trying to model is for a connected car system that has not been deployed (although I hear there's a testbed in MI somewhere). For my research conducting a survey is probably not a reasonable goal for monetary reasons and time constraints so hopefully I can find a dataset that already exists (if I use discrete choice, which I'm not entirely sure that I will) and that is free or relatively cheap. Census data is also an option although the explanatory variables would be extremely limited to sociodemographic information in that case. Since I'm just trying to develop a framework, though, it might be feasible to use fake data and it be for demonstration purposes.
ReplyDelete
Replies
PoffFebruary 25, 2011 at 7:29 AM
The more I think about it the more I think I could use Bayes in place of the Maximum Likelihood that takes place in calibrating the multinomial logit model. The process is Utility function for each choice -> Probability of choosing each option -> assuming errors follow gumbal distribution-> Multinomial logit -> solve for parameters using maximum likelihood. The end result are the parameters that give 'weight' to each explanatory variable (well, really the log(variable)). Instead of getting that point value, though, from Maximum likelihood, I could get a the distribution of each parameter through Bayes and update the parameter distributions using new information. That would be a good way to propagate uncertainty. (I hate passing point values through, but I tend to do it when it's easier)
ReplyDelete
Replies

Add comment