Home » Posts tagged 'Computer science'
Tag Archives: Computer science
Here is an interview with Dan Steinberg, Founder and President of Salford Systems (http://www.salford-systems.com/ )
Ajay- Describe your journey from academia to technology entrepreneurship. What are the key milestones or turning points that you remember.
Dan- When I was in graduate school studying econometrics at Harvard, a number of distinguished professors at Harvard (and MIT) were actively involved in substantial real world activities. Professors that I interacted with, or studied with, or whose software I used became involved in the creation of such companies as Sun Microsystems, Data Resources, Inc. or were heavily involved in business consulting through their own companies or other influential consultants. Some not involved in private sector consulting took on substantial roles in government such as membership on the President’s Council of Economic Advisors. The atmosphere was one that encouraged free movement between academia and the private sector so the idea of forming a consulting and software company was quite natural and did not seem in any way inconsistent with being devoted to the advancement of science.
Ajay- What are the latest products by Salford Systems? Any future product plans or modification to work on Big Data analytics, mobile computing and cloud computing.
Dan- Our central set of data mining technologies are CART, MARS, TreeNet, RandomForests, and PRIM, and we have always maintained feature rich logistic regression and linear regression modules. In our latest release scheduled for January 2012 we will be including a new data mining approach to linear and logistic regression allowing for the rapid processing of massive numbers of predictors (e.g., one million columns), with powerful predictor selection and coefficient shrinkage. The new methods allow not only classic techniques such as ridge and lasso regression, but also sub-lasso model sizes. Clear tradeoff diagrams between model complexity (number of predictors) and predictive accuracy allow the modeler to select an ideal balance suitable for their requirements.
The new version of our data mining suite, Salford Predictive Modeler (SPM), also includes two important extensions to the boosted tree technology at the heart of TreeNet. The first, Importance Sampled learning Ensembles (ISLE), is used for the compression of TreeNet tree ensembles. Starting with, say, a 1,000 tree ensemble, the ISLE compression might well reduce this down to 200 reweighted trees. Such compression will be valuable when models need to be executed in real time. The compression rate is always under the modeler’s control, meaning that if a deployed model may only contain, say, 30 trees, then the compression will deliver an optimal 30-tree weighted ensemble. Needless to say, compression of tree ensembles should be expected to be lossy and how much accuracy is lost when extreme compression is desired will vary from case to case. Prior to ISLE, practitioners have simply truncated the ensemble to the maximum allowable size. The new methodology will substantially outperform truncation.
The second major advance is RULEFIT, a rule extraction engine that starts with a TreeNet model and decomposes it into the most interesting and predictive rules. RULEFIT is also a tree ensemble post-processor and offers the possibility of improving on the original TreeNet predictive performance. One can think of the rule extraction as an alternative way to explain and interpret an otherwise complex multi-tree model. The rules extracted are similar conceptually to the terminal nodes of a CART tree but the various rules will not refer to mutually exclusive regions of the data.
Ajay- You have led teams that have won multiple data mining competitions. What are some of your favorite techniques or approaches to a data mining problem.
Dan- We only enter competitions involving problems for which our technology is suitable, generally, classification and regression. In these areas, we are partial to TreeNet because it is such a capable and robust learning machine. However, we always find great value in analyzing many aspects of a data set with CART, especially when we require a compact and easy to understand story about the data. CART is exceptionally well suited to the discovery of errors in data, often revealing errors created by the competition organizers themselves. More than once, our reports of data problems have been responsible for the competition organizer’s decision to issue a corrected version of the data and we have been the only group to discover the problem.
In general, tackling a data mining competition is no different than tackling any analytical challenge. You must start with a solid conceptual grasp of the problem and the actual objectives, and the nature and limitations of the data. Following that comes feature extraction, the selection of a modeling strategy (or strategies), and then extensive experimentation to learn what works best.
Ajay- I know you have created your own software. But are there other software that you use or liked to use?
Dan- For analytics we frequently test open source software to make sure that our tools will in fact deliver the superior performance we advertise. In general, if a problem clearly requires technology other than that offered by Salford, we advise clients to seek other consultants expert in that other technology.
Ajay- Your software is installed at 3500 sites including 400 universities as per http://www.salford-systems.com/company/aboutus/index.html What is the key to managing and keeping so many customers happy?
Dan- First, we have taken great pains to make our software reliable and we make every effort to avoid problems related to bugs. Our testing procedures are extensive and we have experts dedicated to stress-testing software . Second, our interface is designed to be natural, intuitive, and easy to use, so the challenges to the new user are minimized. Also, clear documentation, help files, and training videos round out how we allow the user to look after themselves. Should a client need to contact us we try to achieve 24-hour turn around on tech support issues and monitor all tech support activity to ensure timeliness, accuracy, and helpfulness of our responses. WebEx/GotoMeeting and other internet based contact permit real time interaction.
Ajay- What do you do to relax and unwind?
Dan- I am in the gym almost every day combining weight and cardio training. No matter how tired I am before the workout I always come out energized so locating a good gym during my extensive travels is a must. I am also actively learning Portuguese so I look to watch a Brazilian TV show or Portuguese dubbed movie when I have time; I almost never watch any form of video unless it is available in Portuguese.
Dan Steinberg, President and Founder of Salford Systems, is a well-respected member of the statistics and econometrics communities. In 1992, he developed the first PC-based implementation of the original CART procedure, working in concert with Leo Breiman, Richard Olshen, Charles Stone and Jerome Friedman. In addition, he has provided consulting services on a number of biomedical and market research projects, which have sparked further innovations in the CART program and methodology.
Dr. Steinberg received his Ph.D. in Economics from Harvard University, and has given full day presentations on data mining for the American Marketing Association, the Direct Marketing Association and the American Statistical Association. After earning a PhD in Econometrics at Harvard Steinberg began his professional career as a Member of the Technical Staff at Bell Labs, Murray Hill, and then as Assistant Professor of Economics at the University of California, San Diego. A book he co-authored on Classification and Regression Trees was awarded the 1999 Nikkei Quality Control Literature Prize in Japan for excellence in statistical literature promoting the improvement of industrial quality control and management.
His consulting experience at Salford Systems has included complex modeling projects for major banks worldwide, including Citibank, Chase, American Express, Credit Suisse, and has included projects in Europe, Australia, New Zealand, Malaysia, Korea, Japan and Brazil. Steinberg led the teams that won first place awards in the KDDCup 2000, and the 2002 Duke/TeraData Churn modeling competition, and the teams that won awards in the PAKDD competitions of 2006 and 2007. He has published papers in economics, econometrics, computer science journals, and contributes actively to the ongoing research and development at Salford.
Here is an interview with Scott Gidley, CTO and co-founder of leading data quality ccompany DataFlux . DataFlux is a part of SAS Institute and in 2011 acquired Baseline Consulting besides launching the latest version of their Master Data Management product. (more…)
Outstandingly attractive scholarships are available for students willing to travel to Yorkshire. Thats where the Battle of Roses was fought by the British Royal Family.
Emphasis and spaces in email above are made by me.
Message from Dr Top i bell ow-
It is not New York but very old York, in the North of England.
The scholarships carry a tax-free stipend and financial assistance will be
given for travel expenses to and from York. Accommodation for successful
students is available on the University of York Campus.
For information about the tax-free stipend please write to
I am hoping to put this on my pre-ordered or Amazon Wish list. The book the common people who wanted to do data mining with , but were unable to ask aloud they didnt know much. It is written by the seminal Australian authority on data mining Dr Graham Williams whom I interviewed here at http://decisionstats.com/2009/01/13/interview-dr-graham-williams/
Data Mining for the masses using an ergonomically designed Graphical User Interface.
Thank you Springer. Thank you Dr Graham Williams
Data Mining with Rattle and R
The Art of Excavating Data for Knowledge Discovery
Series: Use R
1st Edition., 2011, XX, 409 p. 150 illus. in color.
Softcover, ISBN 978-1-4419-9889-7
Due: August 29, 201154,95 €
- Encourages the concept of programming with data – more than just pushing data through tools, but learning to live and breathe the data
- Accessible to many readers and not necessarily just those with strong backgrounds in computer science or statistics
- Details some of the more popular algorithms for data mining, as well as covering model evaluation and model deployment
Data mining is the art and science of intelligent data analysis. By building knowledge from information, data mining adds considerable value to the ever increasing stores of electronic data that abound today. In performing data mining many decisions need to be made regarding the choice of methodology, the choice of data, the choice of tools, and the choice of algorithms.
Throughout this book the reader is introduced to the basic concepts and some of the more popular algorithms of data mining. With a focus on the hands-on end-to-end process for data mining, Williams guides the reader through various capabilities of the easy to use, free, and open source Rattle Data Mining Software built on the sophisticated R Statistical Software. The focus on doing data mining rather than just reading about data mining is refreshing.
The book covers data understanding, data preparation, data refinement, model building, model evaluation, and practical deployment. The reader will learn to rapidly deliver a data mining project using software easily installed for free from the Internet. Coupling Rattle with R delivers a very sophisticated data mining environment with all the power, and more, of the many commercial offerings.
Content Level » Research
Keywords » Data mining
Related subjects » Physical & Information Science
message from the official google blog-
With programs like Computer Science for High School (CS4HS), we hope to increase the number of CS majors —and therefore the number of people entering into careers in CS—by promoting computer science curriculum at the high school level.
For the fourth consecutive year, we’re funding CS4HS to invest in the next generation of computer scientists and engineers. CS4HS is a workshop for high school and middle school computer science teachers that introduces new and emerging concepts in computing and provides tips, tools and guidance on how to teach them. The ultimate goals are to “train the trainer,” develop a thriving community of high school CS teachers and spread the word about the awe and beauty of computing.
If you’re a university, community college, or technical School in the U.S., Canada, Europe, Middle East or Africa and are interested in hosting a workshop at your institution, please visit www.cs4hs.com to submit an application for grant funding.Applications will be accepted between January 18, 2011 and February 18, 2011.
In addition to submitting your application, on the CS4HS website you’ll find info on how to organize a workshop, as well as websites and agendas from last year’s participants to give you an idea of how the workshops were structured in the past. There’s also a collection ofCS4HS curriculum modules that previous participating schools have shared for future organizers to use in their own program.
- Supporting computer science education with CS4HS (googleblog.blogspot.com)
- Tell the US Federal Government how to fix K-12 CS Ed (computinged.wordpress.com)
- Google Reflects on Its Contributions to Computer Science Education (searchenginejournal.com)
- A Joint Call for Research on Why Computer Science Education is Important for K-12 (computinged.wordpress.com)
- Do High Schools Know What ‘Computer Science’ Is? (developers.slashdot.org)
- Advice On Teaching Linux To CS Freshmen? (ask.slashdot.org)
- Celebrating the second Computer Science Education Week (googleblog.blogspot.com)