Home » Posts tagged 'thesis'
Tag Archives: thesis
Here is an interview with Prof Rob J Hyndman who has created many time series forecasting methods and authored books as well as R packages on the same.
Probably the biggest impact I’ve had is in helping the Australian government forecast the national health budget. In 2001 and 2002, they had underestimated health expenditure by nearly $1 billion in each year which is a lot of money to have to find, even for a national government. I was invited to assist them in developing a new forecasting method, which I did. The new method has forecast errors of the order of plus or minus $50 million which is much more manageable. The method I developed for them was the basis of the ETS models discussed in my 2008 book on exponential smoothing (www.exponentialsmoothing.net)
Here is an interview with one of the younger researchers and rock stars of the R Project, John Myles White, co-author of Machine Learning for Hackers.
Ajay- What inspired you guys to write Machine Learning for Hackers. What has been the public response to the book. Are you planning to write a second edition or a next book?
John-We decided to write Machine Learning for Hackers because there were so many people interested in learning more about Machine Learning who found the standard textbooks a little difficult to understand, either because they lacked the mathematical background expected of readers or because it wasn’t clear how to translate the mathematical definitions in those books into usable programs. Most Machine Learning books are written for audiences who will not only be using Machine Learning techniques in their applied work, but also actively inventing new Machine Learning algorithms. The amount of information needed to do both can be daunting, because, as one friend pointed out, it’s similar to insisting that everyone learn how to build a compiler before they can start to program. For most people, it’s better to let them try out programming and get a taste for it before you teach them about the nuts and bolts of compiler design. If they like programming, they can delve into the details later.
Ajay- What are the key things that a potential reader can learn from this book?
John- We cover most of the nuts and bolts of introductory statistics in our book: summary statistics, regression and classification using linear and logistic regression, PCA and k-Nearest Neighbors. We also cover topics that are less well known, but are as important: density plots vs. histograms, regularization, cross-validation, MDS, social network analysis and SVM’s. I hope a reader walks away from the book having a feel for what different basic algorithms do and why they work for some problems and not others. I also hope we do just a little to shift a future generation of modeling culture towards regularization and cross-validation.
Ajay- Describe your journey as a science student up till your Phd. What are you current research interests and what initiatives have you done with them?
John-As an undergraduate I studied math and neuroscience. I then took some time off and came back to do a Ph.D. in psychology, focusing on mathematical modeling of both the brain and behavior. There’s a rich tradition of machine learning and statistics in psychology, so I got increasingly interested in ML methods during my years as a grad student. I’m about to finish my Ph.D. this year. My research interests all fall under one heading: decision theory. I want to understand both how people make decisions (which is what psychology teaches us) and how they should make decisions (which is what statistics and ML teach us). My thesis is focused on how people make decisions when there are both short-term and long-term consequences to be considered. For non-psychologists, the classic example is probably the explore-exploit dilemma. I’ve been working to import more of the main ideas from stats and ML into psychology for modeling how real people handle that trade-off. For psychologists, the classic example is the Marshmallow experiment. Most of my research work has focused on the latter: what makes us patient and how can we measure patience?
Ajay- How can academia and private sector solve the shortage of trained data scientists (assuming there is one)?
John- There’s definitely a shortage of trained data scientists: most companies are finding it difficult to hire someone with the real chops needed to do useful work with Big Data. The skill set required to be useful at a company like Facebook or Twitter is much more advanced than many people realize, so I think it will be some time until there are undergraduates coming out with the right stuff. But there’s huge demand, so I’m sure the market will clear sooner or later.
(TIL he has played in several rock bands!)
I almost missed out on the R Journal for this month- great reading,
and I liked Dr Hadley’s article on stringr package the best. Really really useful package and nice writing too
(incidentally I just downloaded a local copy of his ggplot website at http://had.co.nz/ggplot2/ggplot-static.zip
I aim to really read that one up
Okay, announcement time
I just signed a contract with Springer for a book on R, some what in first half of 2011
” R for Business Analytics“
its going to be a more business analytics than a stats perspective ( I am a MBA /Mech Engineer)
and use cases would be business analytics cases. Do write to me if you need help doing some analytics in R (business use cases)- or want something featured. Big focus would be on GUI and easier analytics, using the Einsteinian principle to make things as simple as possible but no simpler)
- Analysis of Facebook status updates (revolutionanalytics.com)
- Winners of 2010 ggplot2 case study competition (revolutionanalytics.com)
- Springer launchers new service tool for usage and trends (teleread.com)
- High Impact Analytics Introduces First Real-Time, Software Business Analytics For Small- to Mid-Size Walmart and Sam’s Club Suppliers (prweb.com)
- Top 5 Methods to Choose a Thesis Topic (psipsychologytutor.org)
- Modern football journalism… (scissorskick.wordpress.com)
Step 1 is to create internal motivation to create a blog in the first place
Step 2 is to find what to write
Reasons Bloggers Blog-
Examples- I hate Facebook Platform team treats me badly with waits, and breaks my code.
SAS Marketing wont give me a big discount to make me look good in front of my boss.
Companies wont give me their software for free- even though I will use it to make money (and not play X Box)
Google wont do this- Apple wont do that- Microsoft wont do those.
Revolution would give me 4 great packages but not the open source for RevoScaler (which only 300 people would understand in the first place)
I better kiss the Professor and give a Turkey for dinner, as he sits on my thesis committee.
I will recommend Prof X’s lousy book in the hope he recommends my lousy book as a textbook too.
It is safe to laugh when the boss is making a joke-I should comment on her corporate blog, and retweet her.
I belong to this great online community of smart people. Let me agree to what they say.
I really believe in EVERYTHING that ALL the 2 MILLION members of the community have to say ALL the TIME.
I belong to this online community because all my friends are on my computer.
My blog page rank is now X plus delta tau because of sugary key words (2004)
My technorati numbers rise (2005)
I was once on Digg (2007)
I have Z * exp N followers on Twitter and even more on Facebook (2008)
My Klout is increasing on twitter, My stack overflow reputation ‘s cup floweth over. (2009)
I got time to kill- and I think I may learn more, meet intersting people and discover something wandering on the internet.
All those who wonder are not lost- Wikiquote
I got a story to tell, poems to write, code to give away. A free Blog is something a Chinese , an Iranian and a North korean really really know what the value is.
But after all that, WHY Do Bloggers Blog?
- Because we are still waiting for Facebook to create the Blog Killer.
- Its better than saying I am unemployed and a social loner
- Reddit Karma feels good. Any Karma of any kind.
- Calling BS on Klout and the Concept of Influencers (mizzinformation.com)
- Overheard on #Blogchat: The Next Level (@tc_geeks) (blogworld.com)
- Why Facebook and Twitter Are Not Replacing Blogging (dannybrown.me)
- Report: Blogging Falls to Facebook and Twitter (socialtimes.com)
- Facebook and Twitter have become indispensable to bloggers (venturebeat.com)
- Whaddya Mean-Blogging is Dead? (janetfouts.com)
- (VIDEO) Microblogging vs. Blogging: Is There a Battle? (blogher.com)
Curt Monash at Monash Research pointed out some ongoing open source GPL issues for WordPress and the Thesis issue (Also see http://ma.tt/2009/04/oracle-and-open-source/ and http://www.mattcutts.com/blog/switching-things-around/).
As a user of both going upwards of 2 years- I believe open source and GPL license enforcement are general parts of software strategy of most software companies nowadays. Some thoughts on open source and software strategy-Thesis remains a very very popular theme and has earned upwards of 100,000 $ for its creator (estimate based on 20k plus installs and 60$ avg price)
- Little guys like to give away code to get some satisfaction/ recognition, big guys give away free code only when its necessary or when they are not making money in that product segment anyway.
- As Ethan Hunt said, ” Every Hero needs a Villian”. Every software (market share) war between players needs One Big Company Holding more market share and Open Source Strategy between other player who is not able to create in house code, so effectively out sources by creating open source project. But same open source propent rarely gives away the secret to its own money making project.
- Examples- Google creates open source Android, but wont reveal its secret algorithm for search which drives its main profits,
- Google again puts a paper for MapReduce but it’s Yahoo that champions Hadoop,
- Apple creates open source projects (http://www.apple.com/opensource/) but wont give away its Operating Source codes (why?) which help people buys its more expensive hardware,
- IBM who helped kickstart the whole proprietary code thing (remember MS DOS) is the new champion of open source (http://www.ibm.com/developerworks/opensource/) and
- Microsoft continues to spark open source debate but read http://blogs.technet.com/b/microsoft_blog/archive/2010/07/02/a-perspective-on-openness.aspx and also http://www.microsoft.com/opensource/
- SAS gives away a lot of open source code (Read Jim Davis , CMO SAS here , but will stick to Base SAS code (even though it seems to be making more money by verticals focus and data mining).
- SPSS was the first big analytics company that helps supports R (open source stats software) but will cling to its own code on its softwares.
- WordPress.org gives away its software (and I like Akismet just as well as blogging) for open source, but hey as anyone who is on WordPress.com knows how locked in you can get by its (pricy) platform.
- Vendor Lock-in (wink wink price escalation) is the elephant in the room for Big Software Proprietary Companies.
- SLA Quality, Maintenance and IP safety is the uh-oh for going in for open source software mostly.
- Lack of IP protection for revenue models for open source code is the big bottleneck for a lot of companies- as very few software users know what to do with source code if you give it to them anyways.
- If companies were confident that they would still be earning same revenue and there would be less leakage or theft, they would gladly give away the source code.
- Derivative softwares or extensions help popularize the original softwares.
- Half Way Steps like Facebook Applications the original big company to create a platform for third party creators),
- IPhone Apps and Android Applications show success of creating APIs to help protect IP and software control while still giving some freedom to developers or alternate
- User Interfaces to R in both SAS/IML and JMP is a similar example
- Basically open source is mostly done by under dog while top dog mostly rakes in money ( and envy)
- There is yet to a big commercial success in open source software, though they are very good open source softwares. Just as Google’s success helped establish advertising as an alternate ( and now dominant) revenue source for online companies , Open Source needs a big example of a company that made billions while giving source code away and still retaining control and direction of software strategy.
- Open source people love to hate proprietary packages, yet there are more shades of grey (than black and white) and hypocrisy (read lies) within the open source software movement than the regulated world of big software. People will be still people. Software is just a piece of code. ;)