Home » Posts tagged 'JSON'
Tag Archives: JSON
BigML has created a marketplace for selling Datasets and Models. This is a first (?) as the closest market for Predictive Analytics till now was Rapid Miner’s marketplace for extensions (at http://rapidupdate.de:8180/UpdateServer/faces/index.xhtml)
SELL YOUR DATA
You can make your Dataset public. Mind you: the Datasets we are talking about are BigML’s fancy histograms. This means that other BigML users can look at your Dataset details and create new models based on this Dataset. But they can not see individual records or columns or use it beyond the statistical summaries of the Dataset. Your Source will remain private, so there is no possibility of anyone accessing the raw data.
SELL YOUR MODEL
Now, once you have created a great model, you can share it with the rest of the world. For free or at any price you set.Predictions are paid for in BigML Prediction Credits. The minimum price is ‘Free’ and the maximum price indicated is 100 credits.
White Box Models
Clicking on the white open lock will open up your model to the rest of the world. Anyone can now buy your model, explore it, use it to make predictions
Black Box Models
If you choose the black box setting (the black open lock icon), other BigML users will NOT be able to view or clone your model, but they will be able to use it to make predictions.
DOWNLOAD YOUR MODEL
BigML.com have added downloads to our models. Simply choose the format you want and you can copy/paste the code or text. There is a range of formats that they offer currently: JSON PML, PMML, Python, Ruby, Objective-C, Java, the rules of the decision tree in plain text and a Summary overview of your model. Around the corner are MS Excel downloads and R (of course!).
PUBLICIZE YOUR MODEL
There’s also an ‘embed’ function, so now you can embed the little poster of your model in your blog post or website, so it is easy to share it in your own environment.
It is nice to see Models and Data getting the APPY treatment and hopefully, it will encourage other vendors Iike Google Prediction API etc to further spend thought and effort to reward data mining individuals directly without going through corporate intermediaries while ensuring intellectual property safeguards .
An R package market for enterprises? for Python libraries? JMP addins? A market for SAS Macros- who knows what the future shall hold. But overall, this is a very positive step by the BigML.com team. The App marketplace has helped revolutionize mobile and desktop computing and hopefully it will do the same for Business Analytics.
Here is an interview with Charlie Parker, head of large scale online algorithms at http://bigml.com
Ajay- Describe your own personal background in scientific computing, and how you came to be involved with machine learning, cloud computing and BigML.com
Charlie- I am a machine learning Ph.D. from Oregon State University. Francisco Martin (our founder and CEO), Adam Ashenfelter (the lead developer on the tree algorithm), and myself were all studying machine learning at OSU around the same time. We all went our separate ways after that.
Francisco started Strands and turned it into a 100+ million dollar company building recommender systems. Adam worked for CleverSet, a probabilistic modeling company that was eventually sold to Cisco, I believe. I worked for several years in the research labs at Eastman Kodak on data mining, text analysis, and computer vision.
When Francisco left Strands to start BigML, he brought in Justin Donaldson who is a brilliant visualization guy from Indiana, and an ex-Googler named Jose Ortega who is responsible for most of our data infrastructure. They pulled in Adam and I a few months later. We also have Poul Petersen, a former Strands employee, who manages our herd of servers. He is a wizard and makes everyone else’s life much easier.
Ajay- You use clojure for the back end of BigML.com .Are there any other languages and packages you are considering? What makes clojure such a good fit for cloud computing ?
Charlie- Clojure is a great language because it offers you all of the benefits of Java (extensive libraries, cross-platform compatibility, easy integration with things like Hadoop, etc.) but has the syntactical elegance of a functional language. This makes our code base small and easy to read as well as powerful.
We’ve had occasional issues with speed, but that just means writing the occasional function or library in Java. As we build towards processing data at the Terabyte level, we’re hoping to create a framework that is language-agnostic to some extent. So if we have some great machine learning code in C, for example, we’ll use Clojure to tie everything together, but the code that does the heavy lifting will still be in C. For the API and Web layers, we use Python and Django, and Justin is a huge fan of HaXe for our visualizations.
Ajay- Current support is for Decision Trees. When can we see SVM, K Means Clustering and Logit Regression?
Charlie- Right now we’re focused on perfecting our infrastructure and giving you new ways to put data in the system, but expect to see more algorithms appearing in the next few months. We want to make sure they are as beautiful and easy to use as the trees are. Without giving too much away, the first new thing we will probably introduce is an ensemble method of some sort (such as Boosting or Bagging). Clustering is a little further away but we’ll get there soon!
Ajay- How can we use the BigML.com API using R and Python.
Charlie- We have a public github repo for the language bindings. https://github.com/bigmlcom/io Right now, there there are only bash scripts but that should change very soon. The python bindings should be there in a matter of days, and the R bindings in probably a week or two. Clojure and Java bindings should follow shortly after that. We’ll have a blog post about it each time we release a new language binding. http://blog.bigml.com/
Ajay- How can we predict large numbers of observations using a Model that has been built and pruned (model scoring)?
Charlie- We are in the process of refactoring our backend right now for better support for batch prediction and model evaluation. This is something that is probably only a few weeks away. Keep your eye on our blog for updates!
Ajay- How can we export models built in BigML.com for scoring data locally.
Charlie- This is as simple as a call to our API. https://bigml.com/developers/models The call gives you a JSON object representing the tree that is roughly equivalent to a PMML-style representation.
You can read about Charlie Parker at http://www.linkedin.com/pub/charles-parker/11/85b/4b5 and the rest of the BigML team at
I had a chance to dekko the new startup BigML https://bigml.com/ and was suitably impressed by the briefing and my own puttering around the site. Here is my review-
1) The website is very intutively designed- You can create a dataset from an uploaded file in one click and you can create a Decision Tree model in one click as well. I wish other cloud computing websites like Google Prediction API make design so intutive and easy to understand. Also unlike Google Prediction API, the models are not black box models, but have a description which can be understood.
2) It includes some well known data sources for people trying it out. They were kind enough to offer 5 invite codes for readers of Decisionstats ( if you want to check it yourself, use the codes below the post, note they are one time only , so the first five get the invites.
BigML is still invite only but plan to get into open release soon.
3) Data Sources can only be by uploading files (csv) but they plan to change this hopefully to get data from buckets (s3? or Google?) and from URLs.
4) The one click operation to convert data source into a dataset shows a histogram (distribution) of individual variables.The back end is clojure , because the team explained it made the easiest sense and fit with Java. The good news (?) is you would never see the clojure code at the back end. You can read about it from http://clojure.org/
As cloud computing takes off (someday) I expect clojure popularity to take off as well.
Clojure is a dialect of Lisp
5) As of now decision trees is the only distributed algol, but they expect to roll out other machine learning stuff soon. Hopefully this includes regression (as logit and linear) and k means clustering. The trees are created and pruned in real time which gives a slightly animated (and impressive effect). and yes model building is an one click operation.
The real time -live pruning is really impressive and I wonder why /how it can ever be replicated in other software based on desktop, because of the sheer interactive nature.
Making the model is just half the work. Creating predictions and scoring the model is what is really the money-earner. It is one click and customization is quite intuitive. It is not quite PMML compliant yet so I hope some Zemanta like functionality can be added so huge amounts of models can be applied to predictions or score data in real time.
If you are a developer/data hacker, you should check out this section too- it is quite impressive that the designers of BigML have planned for API access so early.
BigML.io gives you:
- Secure programmatic access to all your BigML resources.
- Fully white-box access to your datasets and models.
- Asynchronous creation of datasets and models.
- Near real-time predictions.
Note: For your convenience, some of the snippets below include your real username and API key.
Please keep them secret.
BigML.io conforms to the design principles of Representational State Transfer (REST). BigML.io is enterely HTTP-based.
BigML.io gives you access to four basic resources: Source, Dataset, Model and Prediction. You cancreate, read, update, and delete resources using the respective standard HTTP methods: POST, GET,PUT and DELETE.
All access to BigML.io must be performed over HTTPS
and https://bigml.com/developers/quick_start ( In think an R package which uses JSON ,RCurl would further help in enhancing ease of usage).
Overall a welcome addition to make software in the real of cloud computing and statistical computation/business analytics both easy to use and easy to deploy with fail safe mechanisms built in.
Check out https://bigml.com/ for yourself to see.
The invite codes are here -one time use only- first five get the invites- so click and try your luck, machine learning on the cloud.
If you dont get an invite (or it is already used, just leave your email there and wait a couple of days to get approval)
There are multiple packages in R to read data straight from online datasets.
These are as follows- (more…)
You can go to https://code.google.com/apis/console/b/0/
Unlike Android and other free stuff these APIs are very promising for revenue generation as some of them are very unique to Google itself, and already some are being offered on a Pricing Tier. There are 18 APIs in total with 3 APIs having Pricing while the rest are in beta stages.
I am just listing down all the APIs in one place – (more…)