Home » Posts tagged 'algorithms'
Tag Archives: algorithms
What is an algorithm anyway?
As per Wikipedia- http://en.wikipedia.org/wiki/Algorithm
an algorithm is a step-by-step procedure for calculations. Algorithms are used for calculation, data processing, and automated reasoning.
An algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Starting from an initial state and initial input (perhaps empty), the instructions describe a computation that, when executed, proceeds through a finite number of well-defined successive states, eventually producing “output” and terminating at a final ending state. The transition from one state to the next is not necessarily deterministic; some algorithms, known as randomized algorithms, incorporate random input
Where do I hear the word algorithm being used? Or the wat er cooler version- algols
- Pagerank – how Google calculates search results
- Public key cryptography – keeping credit card data secure
- Correcting errors (in CDs)
- Protecting passwords (cryptographic hash function)
- Perlin noise: generating landscapes in games
But Google NGrams thinks algorithms is flat in books
and Google Trends think the word is actually declining. But India remains a top user of searching for algorithms
But algorithms are increasing in ArXiv articles
and there is a bit of up and down in Algorithms Jobs
What do you think- do you hear the word too much or too little?
Ajay- Why did you choose Rapid Miner and R? What were the other software alternatives you considered and discarded?
Analyst- We considered most of the other major players in statistics/data mining or enterprise BI. However, we found that the value proposition for an open source solution was too compelling to justify the premium pricing that the commercial solutions would have required. The widespread adoption of R and the variety of packages and algorithms available for it, made it an easy choice. We liked RapidMiner as a way to design structured, repeatable processes, and the ability to optimize learner parameters in a systematic way. It also handled large data sets better than R on 32-bit Windows did. The GUI, particularly when 5.0 was released, made it more usable than R for analysts who weren’t experienced programmers.
Ajay- What analytics do you do think Rapid Miner and R are best suited for?
Analyst- We use RM+R mainly for sports analysis so far, rather than for more traditional business applications. It has been quite suitable for that, and I can easily see how it would be used for other types of applications.
Ajay- Any experiences as an enterprise customer? How was the installation process? How good is the enterprise level support?
Analyst- Rapid-I has been one of the most responsive tech companies I’ve dealt with, either in my current role or with previous employers. They are small enough to be able to respond quickly to requests, and in more than one case, have fixed a problem, or added a small feature we needed within a matter of days. In other cases, we have contracted with them to add larger pieces of specific functionality we needed at reasonable consulting rates. Those features are added to the mainline product, and become fully supported through regular channels. The longer consulting projects have typically had a turnaround of just a few weeks.
Ajay- What challenges if any did you face in executing a pure open source analytics bundle ?
Analyst- As Rapid-I is a smaller company based in Europe, the availability of training and consulting in the USA isn’t as extensive as for the major enterprise software players, and the time zone differences sometimes slow down the communications cycle. There were times where we were the first customer to attempt a specific integration point in our technical environment, and with no prior experiences to fall back on, we had to work with Rapid-I to figure out how to do it. Compared to the what traditional software vendors provide, both R and RM tend to have sparse, terse, occasionally incomplete documentation. The situation is getting better, but still lags behind what the traditional enterprise software vendors provide.
Ajay- What are the things you can do in R ,and what are the things you prefer to do in Rapid Miner (comparison for technical synergies)
Analyst- Our experience has been that RM is superior to R at writing and maintaining structured processes, better at handling larger amounts of data, and more flexible at fine-tuning model parameters automatically. The biggest limitation we’ve had with RM compared to R is that R has a larger library of user-contributed packages for additional data mining algorithms. Sometimes we opted to use R because RM hadn’t yet implemented a specific algorithm. The introduction the R extension has allowed us to combine the strengths of both tools in a very logical and productive way.
In particular, extending RapidMiner with R helped address RM’s weakness in the breadth of algorithms, because it brings the entire R ecosystem into RM (similar to how Rapid-I implemented much of the Weka library early on in RM’s development). Further, because the R user community releases packages that implement new techniques faster than the enterprise vendors can, this helps turn a potential weakness into a potential strength. However, R packages tend to be of varying quality, and are more prone to go stale due to lack of support/bug fixes. This depends heavily on the package’s maintainer and its prevalence of use in the R community. So when RapidMiner has a learner with a native implementation, it’s usually better to use it than the R equivalent.
An awesome conference by an awesome software Rapid Miner remains one of the leading enterprise grade open source software , that can help you do a lot of things including flow driven data modeling ,web mining ,web crawling etc which even other software cant.
- Mining Machine 2 Machine Data (Katharina Morik, TU Dortmund University)
- Handling Big Data (Andras Benczur, MTA SZTAKI)
- Introduction of RapidAnalytics at Telenor (Telenor and United Consult)
- and more
Here is a list of complete program
09:00 – 10:30
Ingo Mierswa (Rapid-I)Resource-aware Data Mining or M2M Mining (Invited Talk)
Katharina Morik (TU Dortmund University)
NeurophRM: Integration of the Neuroph framework into RapidMiner
|To be announced (Invited Talk)
Extending RapidMiner with Recommender Systems Algorithms
Implementation of User Based Collaborative Filtering in RapidMiner
|Parallel Training / Workshop Session
10:30 – 11:00
11:00 – 12:30
Nearest-Neighbor and Clustering based Anomaly Detection Algorithms for RapidMiner
Customers’ LifeStyle Targeting on Big Data using Rapid Miner
Robust GPGPU Plugin Development for RapidMiner
Optimization Plugin For RapidMiner
Image Mining Extension – Year After
Incorporating R Plots into RapidMiner Reports
12:30 – 13:30
13:30 – 15:30
|Parallel Training / Workshop Session
Introduction of RapidAnalyticy Enterprise Edition at Telenor Hungary
Application of RapidMiner in Steel Industry Research and Development
A Comparison of Data-driven Models for Forecast River Flow
Portfolio Optimization Using Local Linear Regression Ensembles in Rapid Miner
An Octave Extension for RapidMiner
Processing Data Streams with the RapidMiner Streams-Plugin
Automated Creation of Corpuses for the Needs of Sentiment Analysis
Demonstration: News from the Rapid-I Labs
This short session demonstrates the latest developments from the Rapid-I lab and will let you how you can build powerful analysis processes and routines by using those RapidMiner tools.
15:30 – 16:00
16:00 – 18:00
|Book Presentation and Game Show
Data Mining for the Masses: A New Textbook on Data Mining for Everyone
Matthew North presents his new book “Data Mining for the Masses” introducing data mining to a broader audience and making use of RapidMiner for practical data mining problems.
Get some Coffee for free – Writing Operators with RapidMiner Beans
Meta-Modeling Execution Times of RapidMiner operators
Conference day ends at ca. 17:00.
Social Event (Conference Dinner)
Social Event (Visit of Bar District)
and you should have a look at https://rapid-i.com/rcomm2012f/index.php?option=com_content&view=article&id=65
Conference is in Budapest, Hungary,Europe.
( Disclaimer- Rapid Miner is an advertising sponsor of Decisionstats.com in case you didnot notice the two banner sized ads.)
Amazon gets some competition, and customers should see some relief, unless Google withdraws commitment on these products after a few years of trying (like it often does now!)
|Machine Type Pricing|
|Configuration||Virtual Cores||Memory||GCEU *||Local disk||Price/Hour||$/GCEU/hour|
|n1-standard-1-d||1||3.75GB ***||2.75||420GB ***||$0.145||0.053|
|n1-standard-8-d||8||30GB||22||2 x 1770GB||$1.16||0.053|
|Egress to the same Zone.||Free|
|Egress to a different Cloud service within the same Region.||Free|
|Egress to a different Zone in the same Region (per GB)||$0.01|
|Egress to a different Region within the US||$0.01 ****|
|Inter-continental Egress||At Internet Egress Rate|
|Internet Egress (Americas/EMEA destination) per GB|
|0-1 TB in a month||$0.12|
|Internet Egress (APAC destination) per GB|
|0-1 TB in a month||$0.21|
|Persistent Disk Pricing|
|Provisioned space||$0.10 GB/month|
|Snapshot storage**||$0.125 GB/month|
|IO Operations||$0.10 per million|
|IP Address Pricing|
|Static IP address (assigned but unused)||$0.01 per hour|
|Ephemeral IP address (attached to instance)||Free|
** coming soon
*** 1GB is defined as 2^30 bytes
**** promotional pricing; eventually will be charged at internet download rates
Google Prediction API
Tap into Google’s machine learning algorithms to analyze data and predict future outcomes.
Leverage machine learning without the complexity
Use the familiar RESTful interface
Enter input in any format – numeric or text
Build smart apps
Learn how you can use Prediction API to build customer sentiment analysis, spam detection or document and email classification.
Google Translation API
Use Google Translate API to build multilingual apps and programmatically translate text in your webpage or application.
Translate text into other languages programmatically
Use the familiar RESTful interface
Take advantage of Google’s powerful translation algorithms
Build multilingual apps
Learn how you can use Translate API to build apps that can programmatically translate text in your applications or websites.
Analyze Big Data in the cloud using SQL and get real-time business insights in seconds using Google BigQuery. Use a fully-managed data analysis service with no servers to install or maintain.
Reliable & Secure
Complete peace of mind as your data is automatically replicated across multiple sites and secured using access control lists.
You can store up to hundreds of terabytes, paying only for what you use.
Run ad hoc SQL queries on
multi-terabyte datasets in seconds.
Google App Engine
Create apps on Google’s platform that are easy to manage and scale. Benefit from the same systems and infrastructure that power Google’s applications.
Focus on your apps
Let us worry about the underlying infrastructure and systems.
See your applications scale seamlessly from hundreds to millions of users.
Premium paid support and 99.95% SLA for business users
Here is an interview with Jason Kuo who works with SAP Analytics as Group Solutions Marketing Manager. Jason answers questions on SAP Analytics and it’s increasing involvement with R statistical language.
Ajay- What made you choose R as the language to tie important parts of your technology platform like HANA and SAP Predictive Analysis. Did you consider other languages like Julia or Python.
Jason- It’s the most popular. Over 50% of the statisticians and data analysts use R. With 3,500+ algorithms its arguably the most comprehensive statistical analysis language. That said,we are not closing the door on others.
Ajay- When did you first start getting interested in R as an analytics platform?
Jason- SAP has been tracking R for 5+ years. With R’s explosive growth over the last year or two, it made sense for us to dramatically increase our investment in R.
Ajay- Can we expect SAP to give back to the R community like Google and Revolution Analytics does- by sponsoring Package development or sponsoring user meets and conferences?
Will we see SAP’s R HANA package in this year’s R conference User 2012 in Nashville
Jason- Yes. We plan to provide a specific driver for HANA tables for input of the data to native R. This planned for end of 2012. We’ll then review our event strategy. SAP has been a sponsor of Predictive Analytics World for several years and was indeed a founding sponsor. We may be attending the year’s R conference in Nashville.
Ajay- What has been some of the initial customer feedback to your analytics expansion and offerings.
Jason- We have completed two very successful Pilots of the R Integration for HANA with two of SAP’s largest customers.
Jason has over 15 years of BI and Data Warehousing industry experience. Having worked at Oracle, Business Objects, and now SAP, Jason has been involved in numerous technical marketing roles involving performance management dashboards, information management, text analysis, predictive analytics, and now big data. He has a bachelor’s of science in operations research from the University of Michigan.