Home » Posts tagged 'gnu'
Tag Archives: gnu
A bit belatedly I return to my second favorite Office Productivity Software (the first being Cloud- Google Docs).
July 9, 2011
The registration for the LibreOffice Conference, taking place in Paris from October 12th to 15th, is now open. Everyone interested in joining the first annual meeting of the LibreOffice community is invited to register online at
to help the organizers in planning.
The LibreOffice Conference will be the event for those interested in the development of free office productivity software, open standards, and the OpenDocument format generally, and is an exciting opportunity to meet community members, developers and hackers. It is sponsored by Cap Digital, Région Île de France, IRILL, Canonical, Google, La Mouette, Novell/SUSE, Red Hat, AF 83, Ars Aperta and Lanedo.
The Call for Papers is also open until July 22nd, and paper submissions will be reviewed by a community committee.
We look forward meeting you in the heart of France, celebrating the first year of LibreOffice, and discussing the plans for the next months.
The Steering Committee of The Document Foundation
Official LibreOffice Conference
Please enter your personal data to register for Paris, Oct 12 – 15, 2011.
List of All Libre Office Announcements-
I just checked out this new software for making PMML models. It is called Augustus and is created by the Open Data Group (http://opendatagroup.com/) , which is headed by Robert Grossman, who was the first proponent of using R on Amazon Ec2.
Probably someone like Zementis ( http://adapasupport.zementis.com/ ) can use this to further test , enhance or benchmark on the Ec2. They did have a joint webinar with Revolution Analytics recently.
- Augustus v 0.4.3.1 has been released
- Added a guide (pdf) for including Augustus in the Windows System Properties.
- Updated the install documentation.
- Augustus 2010.II (Summer) release is available. This is v 0.4.2.0. More information is here.
- Added performance discussion concerning the optional cyclic garbage collection.
See Recent News for more details and all recent news.
Augustus is a PMML 4-compliant scoring engine that works with segmented models. Augustus is designed for use with statistical and data mining models. The new release provides Baseline, Tree and Naive-Bayes producers and consumers.
There is also a version for use with PMML 3 models. It is able to produce and consume models with 10,000s of segments and conforms to a PMML draft RFC for segmented models and ensembles of models. It supports Baseline, Regression, Tree and Naive-Bayes.
Augustus is written in Python and is freely available under the GNU General Public License, version 2.
See the page Which version is right for me for more details regarding the different versions.
Predictive Model Markup Language (PMML) is an XML mark up language to describe statistical and data mining models. PMML describes the inputs to data mining models, the transformations used to prepare data for data mining, and the parameters which define the models themselves. It is used for a wide variety of applications, including applications in finance, e-business, direct marketing, manufacturing, and defense. PMML is often used so that systems which create statistical and data mining models (“PMML Producers”) can easily inter-operate with systems which deploy PMML models for scoring or other operational purposes (“PMML Consumers”).
For information regarding using Augustus with Change Detection and Health and Status Monitoring, please see change-detection.
Open Data Group provides management consulting services, outsourced analytical services, analytic staffing, and expert witnesses broadly related to data and analytics. It has experience with customer data, supplier data, financial and trading data, and data from internal business processes.
It has staff in Chicago and San Francisco and clients throughout the U.S. Open Data Group began operations in 2002.
The above example contains plots generated in R of scoring results from Augustus. Each point on the graph represents a use of the scoring engine and a chart is an aggregation of multiple Augustus runs. A Baseline (Change Detection) model was used to score data with multiple segments.
Augustus is typically used to construct models and score data with models. Augustus includes a dedicated application for creating, or producing, predictive models rendered as PMML-compliant files. Scoring is accomplished by consuming PMML-compliant files describing an appropriate model. Augustus provides a dedicated application for scoring data with four classes of models, Baseline (Change Detection) Models, Tree Models, Regression Models and Naive Bayes Models. The typical model development and use cycle with Augustus is as follows:
- Identify suitable data with which to construct a new model.
- Provide a model schema which proscribes the requirements for the model.
- Run the Augustus producer to obtain a new model.
- Run the Augustus consumer on new data to effect scoring.
Separate consumer and producer applications are supplied for Baseline (Change Detection) models, Tree models, Regression models and for Naive Bayes models. The producer and consumer applications require configuration with XML-formatted files. The specification of the configuration files and model schema are detailed below. The consumers provide for some configurability of the output but users will often provide additional post-processing to render the output according to their needs. A variety of mechanisms exist for transmitting data but user’s may need to provide their own preprocessing to accommodate their particular data source.
In addition to the producer and consumer applications, Augustus is conceptually structured and provided with libraries which are relevant to the development and use of Predictive Models. Broadly speaking, these consist of components that address the use of PMML and components that are specific to Augustus.
Augustus can accommodate a post-processing step. While not necessary, it is often useful to
- Re-normalize the scoring results or performing an additional transformation.
- Supplements the results with global meta-data such as timestamps.
- Formatting of the results.
- Select certain interesting values from the results.
- Restructure the data for use with other applications.
- Revolution R, PMML and ADAPA: Webinar April 13 (revolutionanalytics.com)
- Predicting R models with PMML: Revolution R Enterprise and ADAPA (revolutionanalytics.com)
- In case you missed it: March Roundup (revolutionanalytics.com)
Well I have played with software (mostly but not exclusively) analytical, and I admire the zeal and energy of both open source and closed source practioners- all having relatively decent people executing strategies their investors or owners tell them to do (closed source) or motivated by their own self sense of cool-change the world-openness (open source)
What I dont get is people stealing open source code- repackaging without adding major contributions- claiming patent pending stuff- and basically making money by creating CLOSED source from the open source software-(as open source is yet to break the enterprise glass cieling)
you are either open source or you arent.
bi- sexuality is okay. bi-codability is not.
Next time you see someone stealing some community’s open source code- refer to this excellent link.
But, we cannot act on our own if we do not hold copyright. Thus, be sure to find out who the copyright holders of the software are before reporting a violation.
Violations of the GNU Licenses
- Does the distribution contain a copy of the License?
- Does it clearly state which software is covered by the License? Does it say anything misleading, perhaps giving the impression that something is covered by the License when in fact it is not?
- Is source code included in the distribution?
- Is a written offer for source code included with a distribution of just binaries?
- Is the available source code complete, or is it designed for linking in other non-free modules?
If there seems to be a real violation, the next thing you need to do is record the details carefully:
- the precise name of the product
- the name of the person or organization distributing it
- email addresses, postal addresses and phone numbers for how to contact the distributor(s)
- the exact name of the package whose license is violated
- how the license was violated:
- Is the copyright notice of the copyright holder included?
- Is the source code completely missing?
- Is there a written offer for source that’s incomplete in some way? This could happen if it provides a contact address or network URL that’s somehow incorrect.
- Is there a copy of the license included in the distribution?
- Is some of the source available, but not all? If so, what parts are missing?
The more of these details that you have, the easier it is for the copyright holder to pursue the matter.
Once you have collected the details, you should send a precise report to the copyright holder of the packages that are being misused. The copyright holder is the one who is legally authorized to take action to enforce the license.
If the copyright holder is the Free Software Foundation, please send the report to <email@example.com>. It’s important that we be able to write back to you to get more information about the violation or product. So, if you use an anonymous remailer, please provide a return path of some sort. If you’d like to encrypt your correspondence, just send a brief mail saying so, and we’ll make appropriate arrangements.
Note that the GPL, and other copyleft licenses, are copyright licenses. This means that only the copyright holders are empowered to act against violations. The FSF acts on all GPL violations reported on FSF copyrighted code, and we offer assistance to any other copyright holder who wishes to do the same.
But, we cannot act on our own if we do not hold copyright. Thus, be sure to find out who the copyright holders of the software are before reporting a violation.
- iOS beats Android at open source app compliance, says study (linuxfordevices.com)
- The GPL is a License, Not a Contract (groklaw.net)
- Google’s Android faces a serious Linux copyright issue (potentially bigger than its Java problem) (fosspatents.blogspot.com)
- Google accused of violating GPLv2 licensing in Android (linuxfordevices.com)
- The Open Source trials: hanging in the legal balance of copyright and copyleft (visionmobile.com)
- Email To The FSF About WordPress’s GPL License Violations (smackdown.blogsblogsblogs.com)
- More evidence of Google’s habit of GPL laundering in Android: the BlueZ Bluetooth stack and the ext4 file system (fosspatents.blogspot.com)
- Most Android, iPhone apps violate open source rules (macworld.com)
- Android violates Linux license, experts claim (infoworld.com)
- Koha Community Considers Affero License (go-to-hellman.blogspot.com)
- How to avoid public GPL floggings on Apple’s App Store (zdnet.com)
- Ask HN: Open sourcing our product? (news.ycombinator.com)
- Most Mobile Phone Apps Violate Open Source Rules (pcworld.com)
- WordPress Creator GPL Says WP Template Must Be GPL’d (yro.slashdot.org)
- Study: 70 percent of iPhone and Android open source apps violate licenses (infoworld.com)
- Australian Telco Telstra Complies With GPL (news.slashdot.org)
- Hosting Company Appears To Be Violating the GPL (yro.slashdot.org)
Here is an interview with noted Analytics Consultant and trainer Dean Abbott. Dean is scheduled to take a workshop on Predictive Analytics at PAW (Predictive Analytics World Conference) Oct 18 , 2010 in Washington D.C
Ajay- Describe your upcoming hands on workshop at Predictive Analytics World and how it can help people learn more predictive modeling.
Dean- The hands-on workshop is geared toward individuals who know something about predictive analytics but would like to experience the process. It will help people in two regards. First, by going through the data assessment, preparation, modeling and model assessment stages in one day, the attendees will see how predictive analytics works in reality, including some of the pain associated with false starts and mistakes. At the same time, they will experience success with building reasonable models to solve a problem in a single day. I have found that for many, having to actually build the predictive analytics solution if an eye-opener. Seeing demonstrations show the capabilities of a tool, but greater value for an end-user is the development of intuition of what to do at each each stage of the process that makes the theory of predictive analytics real.
Second, they will gain experience using a top-tier predictive analytics software tool, Enterprise Miner (EM). This is especially helpful for those who are considering purchasing EM, but also for those who have used open source tools and have never experienced the additional power and efficiencies that come with a tool that is well thought out from a business solutions standpoint (as opposed to an algorithm workbench).
Ajay- You are an instructor with software ranging from SPSS, S Plus, SAS Enterprise Miner, Statistica and CART. What features of each software do you like best and are more suited for application in data cases.
Dean- I’ll add Tibco Spotfire Miner, Polyanalyst and Unica’s Predictive Insight to the list of tools I’ve taught “hands-on” courses around, and there are at least a half dozen more I demonstrate in lecture courses (JMP, Matlab, Wizwhy, R, Ggobi, RapidMiner, Orange, Weka, RandomForests and TreeNet to name a few). The development of software is a fascinating undertaking, and each tools has its own strengths and weaknesses.
I personally gravitate toward tools with data flow / icon interface because I think more that way, and I’ve tired of learning more programming languages.
Since the predictive analytics algorithms are roughly the same (backdrop is backdrop no matter which tool you use), the key differentiators are
(1) how data can be loaded in and how tightly integrated can the tool be with the database,
(2) how well big data can be handled,
(3) how extensive are the data manipulation options,
(4) how flexible are the model reporting options, and
(5) how can you get the models and/or predictions out.
There are vast differences in the tools on these matters, so when I recommend tools for customers, I usually interview them quite extensively to understand better how they use data and how the models will be integrated into their business practice.
A final consideration is related to the efficiency of using the tool: how much automation can one introduce so that user-interaction is minimized once the analytics process has been defined. While I don’t like new programming languages, scripting and programming often helps here, though some tools have a way to run the visual programming data diagram itself without converting it to code.
Ajay- What are your views on the increasing trend of consolidation and mergers and acquisitions in the predictive analytics space. Does this increase the need for vendor neutral analysts and consultants as well as conferences.
Dean- When companies buy a predictive analytics software package, it’s a mixed bag. SPSS purchasing of Clementine was ultimately good for the predictive analytics, though it took several years for SPSS to figure out what they wanted to do with it. Darwin ultimately disappeared after being purchased by Oracle, but the newer Oracle data mining tool, ODM, integrates better with the database than Darwin did or even would have been able to.
The biggest trend and pressure for the commercial vendors is the improvements in the Open Source and GNU tools. These are becoming more viable for enterprise-level customers with big data, though from what I’ve seen, they haven’t caught up with the big commercial players yet. There is great value in bringing both commercial and open source tools to the attention of end-users in the context of solutions (rather than sales) in a conference setting, which is I think an advantage that Predictive Analytics World has.
As a vendor-neutral consultant, flux is always a good thing because I have to be proficient in a variety of tools, and it is the breadth that brings value for customers entering into the predictive analytics space. But it is very difficult to keep up with the rapidly-changing market and that is something I am weighing myself: how many tools should I keep in my active toolbox.
Ajay- Describe your career and how you came into the Predictive Analytics space. What are your views on various MS Analytics offered by Universities.
Dean- After getting a masters degree in Applied Mathematics, my first job was at a small aerospace engineering company in Charlottesville, VA called Barron Associates, Inc. (BAI); it is still in existence and doing quite well! I was working on optimal guidance algorithms for some developmental missile systems, and statistical learning was a key part of the process, so I but my teeth on pattern recognition techniques there, and frankly, that was the most interesting part of the job. In fact, most of us agreed that this was the most interesting part: John Elder (Elder Research) was the first employee at BAI, and was there at that time. Gerry Montgomery and Paul Hess were there as well and left to form a data mining company called AbTech and are still in analytics space.
After working at BAI, I had short stints at Martin Marietta Corp. and PAR Government Systems were I worked on analytics solutions in DoD, primarily radar and sonar applications. It was while at Elder Research in the 90s that began working in the commercial space more in financial and risk modeling, and then in 1999 I began working as an independent consultant.
One thing I love about this field is that the same techniques can be applied broadly, and therefore I can work on CRM, web analytics, tax and financial risk, credit scoring, survey analysis, and many more application, and cross-fertilize ideas from one domain into other domains.
Regarding MS degrees, let me first write that I am very encouraged that data mining and predictive analytics are being taught in specific class and programs rather than as just an add-on to an advanced statistics or business class. That stated, I have mixed feelings about analytics offerings at Universities.
I find that most provide a good theoretical foundation in the algorithms, but are weak in describing the entire process in a business context. For those building predictive models, the model-building stage nearly always takes much less time than getting the data ready for modeling and reporting results. These are cross-discipline tasks, requiring some understanding of the database world and the business world for us to define the target variable(s) properly and clean up the data so that the predictive analytics algorithms to work well.
The programs that have a practicum of some kind are the most useful, in my opinion. There are some certificate programs out there that have more of a business-oriented framework, and the NC State program builds an internship into the degree itself. These are positive steps in the field that I’m sure will continue as predictive analytics graduates become more in demand.
DEAN ABBOTT is President of Abbott Analytics in San Diego, California. Mr. Abbott has over 21 years of experience applying advanced data mining, data preparation, and data visualization methods in real-world data intensive problems, including fraud detection, response modeling, survey analysis, planned giving, predictive toxicology, signal process, and missile guidance. In addition, he has developed and evaluated algorithms for use in commercial data mining and pattern recognition products, including polynomial networks, neural networks, radial basis functions, and clustering algorithms, and has consulted with data mining software companies to provide critiques and assessments of their current features and future enhancements.
Mr. Abbott is a seasoned instructor, having taught a wide range of data mining tutorials and seminars for a decade to audiences of up to 400, including DAMA, KDD, AAAI, and IEEE conferences. He is the instructor of well-regarded data mining courses, explaining concepts in language readily understood by a wide range of audiences, including analytics novices, data analysts, statisticians, and business professionals. Mr. Abbott also has taught both applied and hands-on data mining courses for major software vendors, including Clementine (SPSS, an IBM Company), Affinium Model (Unica Corporation), Statistica (StatSoft, Inc.), S-Plus and Insightful Miner (Insightful Corporation), Enterprise Miner (SAS), Tibco Spitfire Miner (Tibco), and CART (Salford Systems).