Home » Posts tagged 'science' (Page 3)
Tag Archives: science
This is a piece of science fiction. I wrote while reading Isaac Assimov’s advice to writers in GOLD, while on a beach in Anjuna.
1) Identify senators, lobbyists, senior executives of companies advocating for SOPA. Go for selective targeting of these people than massive Denial of Service Attacks.
This could also include election fund raising websites in the United States.
2) Create hacking tools with simple interfaces to probe commonly known software errors, to enable wider audience including the Occupy Movement students to participate in hacking. thus making hacking more democratic. What are the top 25 errors as per http://cwe.mitre.org/cwss/
Easy interface tools to check vulnerabilities would be the next generation to flooding tools like HOIC, LOIC – Massive DDOS atttacks make good press coverage but not so good technically
3) Disrupt digital payment mechanisms for selected targets (in step1) using tools developed in Step 2, and introduce random noise errors in payment transfers.
4) Help create a better secure internet by embedding Tor within Chromium with all tools for anonymity embedded for easy usage – a more secure peer to peer browser (like a mashup of Opera , tor and chromium).
or maybe embed bit torrents within a browser.
5) Disrupt media companies and cloud computing based companies like iTunes, Spotify or Google Music, just like virus, ant i viruses disrupted the desktop model of computing. After that offer solutions to the problems like companies of anti virus software did for decades.
6) Hacking websites is fine fun, but hacking internet databases and massively parallel data scrapers can help disrupt some of the status quo.
This applies to databases that offer data for sale, like credit bureaus etc. Making this kind of data public will eliminate data middlemen.
7) Use cross border, cross country regulatory arbitrage for better risk control of hacker attacks.
8) recruiting among universities using easy to use hacking tools to expand the pool of dedicated hacker armies.
9) using operations like those targeting child pornography to increase political acceptability of the hacker sub culture. Refrain from overtly negative and unimaginative bad Press Relations
10) If you cant convince them to pass SOPA, confuse them ;) Use bots for random clicks on ads to confuse internet commerce.
Some stuff on Topic Models-
In machine learning and natural language processing, a topic model is a type of statistical model for discovering the abstract “topics” that occur in a collection of documents. An early topic model was probabilistic latent semantic indexing (PLSI), created by Thomas Hofmann in 1999. Latent Dirichlet allocation (LDA), perhaps the most common topic model currently in use, is a generalization of PLSI developed by David Blei, Andrew Ng, and Michael Jordan in 2002, allowing documents to have a mixture of topics. Other topic models are generally extensions on LDA, such as Pachinko allocation, which improves on LDA by modeling correlations between topics in addition to the word correlations which constitute topics. Although topic models were first described and implemented in the context of natural language processing, they have applications in other fields such as bioinformatics.
In statistics, latent Dirichlet allocation (LDA) is a generative model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar. For example, if observations are words collected into documents, it posits that each document is a mixture of a small number of topics and that each word’s creation is attributable to one of the document’s topics. LDA is an example of a topic model
David M Blei’s page on Topic Models-
- a general introduction to topic modeling .
- At KDD-2011 a long tutorial about topic modeling. The slides are here .
- slides from a talk on dynamic and correlated topic models applied to the journal Science . (Here is a video of the talk.)
- a more technical review paper about this field.
- David Mimno maintains a bibliography of topic modeling papers and software.
The topic models mailing list is a good forum for discussing topic modeling.
Some resources I compiled on Slideshare based on the above- (more…)
Here is an interview with Zach Goldberg, who is the product manager of Google Prediction API, the next generation machine learning analytics-as-an-api service state of the art cloud computing model building browser app.
Ajay- Describe your journey in science and technology from high school to your current job at Google.
Zach- First, thanks so much for the opportunity to do this interview Ajay! My personal journey started in college where I worked at a startup named Invite Media. From there I transferred to the Associate Product Manager (APM) program at Google. The APM program is a two year rotational program. I did my first year working in display advertising. After that I rotated to work on the Prediction API.
Ajay- How does the Google Prediction API help an average business analytics customer who is already using enterprise software , servers to generate his business forecasts. How does Google Prediction API fit in or complement other APIs in the Google API suite.
Zach- The Google Prediction API is a cloud based machine learning API. We offer the ability for anybody to sign up and within a few minutes have their data uploaded to the cloud, a model built and an API to make predictions from anywhere. Traditionally the task of implementing predictive analytics inside an application required a fair amount of domain knowledge; you had to know a fair bit about machine learning to make it work. With the Google Prediction API you only need to know how to use an online REST API to get started.
Ajay- What are the additional use cases of Google Prediction API that you think traditional enterprise software in business analytics ignore, or are not so strong on. What use cases would you suggest NOT using Google Prediction API for an enterprise.
Zach- We are living in a world that is changing rapidly thanks to technology. Storing, accessing, and managing information is much easier and more affordable than it was even a few years ago. That creates exciting opportunities for companies, and we hope the Prediction API will help them derive value from their data.
The Prediction API focuses on providing predictive solutions to two types of problems: regression and classification. Businesses facing problems where there is sufficient data to describe an underlying pattern in either of these two areas can expect to derive value from using the Prediction API.
Ajay- What are your separate incentives to teach about Google APIs to academic or researchers in universities globally.
Zach- I’d refer you to our university relations page-
Google thrives on academic curiosity. While we do significant in-house research and engineering, we also maintain strong relations with leading academic institutions world-wide pursuing research in areas of common interest. As part of our mission to build the most advanced and usable methods for information access, we support university research, technological innovation and the teaching and learning experience through a variety of programs.
Ajay- What is the biggest challenge you face while communicating about Google Prediction API to traditional users of enterprise software.
Zach- Businesses often expect that implementing predictive analytics is going to be very expensive and require a lot of resources. Many have already begun investing heavily in this area. Quite often we’re faced with surprise, and even skepticism, when they see the simplicity of the Google Prediction API. We work really hard to provide a very powerful solution and take care of the complexity of building high quality models behind the scenes so businesses can focus more on building their business and less on machine learning.
Here is an interview with Markus Schmidberger, Senior Community Manager for cloudnumbers.com. Cloudnumbers.com is the exciting new cloud startup for scientific computing. It basically enables transition to a R and other platforms in the cloud and makes it very easy and secure from the traditional desktop/server model of operation.
Ajay- Describe the startup story for setting up Cloudnumbers.com
Markus- In 2010 the company founders Erik Muttersbach (TU München), Markus Fensterer (TU München) and Moritz v. Petersdorff-Campen (WHU Vallendar) started with the development of the cloud computing environment. (more…)
Here is an interview with Dan Steinberg, Founder and President of Salford Systems (http://www.salford-systems.com/ )
Ajay- Describe your journey from academia to technology entrepreneurship. What are the key milestones or turning points that you remember.
Dan- When I was in graduate school studying econometrics at Harvard, a number of distinguished professors at Harvard (and MIT) were actively involved in substantial real world activities. Professors that I interacted with, or studied with, or whose software I used became involved in the creation of such companies as Sun Microsystems, Data Resources, Inc. or were heavily involved in business consulting through their own companies or other influential consultants. Some not involved in private sector consulting took on substantial roles in government such as membership on the President’s Council of Economic Advisors. The atmosphere was one that encouraged free movement between academia and the private sector so the idea of forming a consulting and software company was quite natural and did not seem in any way inconsistent with being devoted to the advancement of science.
Ajay- What are the latest products by Salford Systems? Any future product plans or modification to work on Big Data analytics, mobile computing and cloud computing.
Dan- Our central set of data mining technologies are CART, MARS, TreeNet, RandomForests, and PRIM, and we have always maintained feature rich logistic regression and linear regression modules. In our latest release scheduled for January 2012 we will be including a new data mining approach to linear and logistic regression allowing for the rapid processing of massive numbers of predictors (e.g., one million columns), with powerful predictor selection and coefficient shrinkage. The new methods allow not only classic techniques such as ridge and lasso regression, but also sub-lasso model sizes. Clear tradeoff diagrams between model complexity (number of predictors) and predictive accuracy allow the modeler to select an ideal balance suitable for their requirements.
The new version of our data mining suite, Salford Predictive Modeler (SPM), also includes two important extensions to the boosted tree technology at the heart of TreeNet. The first, Importance Sampled learning Ensembles (ISLE), is used for the compression of TreeNet tree ensembles. Starting with, say, a 1,000 tree ensemble, the ISLE compression might well reduce this down to 200 reweighted trees. Such compression will be valuable when models need to be executed in real time. The compression rate is always under the modeler’s control, meaning that if a deployed model may only contain, say, 30 trees, then the compression will deliver an optimal 30-tree weighted ensemble. Needless to say, compression of tree ensembles should be expected to be lossy and how much accuracy is lost when extreme compression is desired will vary from case to case. Prior to ISLE, practitioners have simply truncated the ensemble to the maximum allowable size. The new methodology will substantially outperform truncation.
The second major advance is RULEFIT, a rule extraction engine that starts with a TreeNet model and decomposes it into the most interesting and predictive rules. RULEFIT is also a tree ensemble post-processor and offers the possibility of improving on the original TreeNet predictive performance. One can think of the rule extraction as an alternative way to explain and interpret an otherwise complex multi-tree model. The rules extracted are similar conceptually to the terminal nodes of a CART tree but the various rules will not refer to mutually exclusive regions of the data.
Ajay- You have led teams that have won multiple data mining competitions. What are some of your favorite techniques or approaches to a data mining problem.
Dan- We only enter competitions involving problems for which our technology is suitable, generally, classification and regression. In these areas, we are partial to TreeNet because it is such a capable and robust learning machine. However, we always find great value in analyzing many aspects of a data set with CART, especially when we require a compact and easy to understand story about the data. CART is exceptionally well suited to the discovery of errors in data, often revealing errors created by the competition organizers themselves. More than once, our reports of data problems have been responsible for the competition organizer’s decision to issue a corrected version of the data and we have been the only group to discover the problem.
In general, tackling a data mining competition is no different than tackling any analytical challenge. You must start with a solid conceptual grasp of the problem and the actual objectives, and the nature and limitations of the data. Following that comes feature extraction, the selection of a modeling strategy (or strategies), and then extensive experimentation to learn what works best.
Ajay- I know you have created your own software. But are there other software that you use or liked to use?
Dan- For analytics we frequently test open source software to make sure that our tools will in fact deliver the superior performance we advertise. In general, if a problem clearly requires technology other than that offered by Salford, we advise clients to seek other consultants expert in that other technology.
Ajay- Your software is installed at 3500 sites including 400 universities as per http://www.salford-systems.com/company/aboutus/index.html What is the key to managing and keeping so many customers happy?
Dan- First, we have taken great pains to make our software reliable and we make every effort to avoid problems related to bugs. Our testing procedures are extensive and we have experts dedicated to stress-testing software . Second, our interface is designed to be natural, intuitive, and easy to use, so the challenges to the new user are minimized. Also, clear documentation, help files, and training videos round out how we allow the user to look after themselves. Should a client need to contact us we try to achieve 24-hour turn around on tech support issues and monitor all tech support activity to ensure timeliness, accuracy, and helpfulness of our responses. WebEx/GotoMeeting and other internet based contact permit real time interaction.
Ajay- What do you do to relax and unwind?
Dan- I am in the gym almost every day combining weight and cardio training. No matter how tired I am before the workout I always come out energized so locating a good gym during my extensive travels is a must. I am also actively learning Portuguese so I look to watch a Brazilian TV show or Portuguese dubbed movie when I have time; I almost never watch any form of video unless it is available in Portuguese.
Dan Steinberg, President and Founder of Salford Systems, is a well-respected member of the statistics and econometrics communities. In 1992, he developed the first PC-based implementation of the original CART procedure, working in concert with Leo Breiman, Richard Olshen, Charles Stone and Jerome Friedman. In addition, he has provided consulting services on a number of biomedical and market research projects, which have sparked further innovations in the CART program and methodology.
Dr. Steinberg received his Ph.D. in Economics from Harvard University, and has given full day presentations on data mining for the American Marketing Association, the Direct Marketing Association and the American Statistical Association. After earning a PhD in Econometrics at Harvard Steinberg began his professional career as a Member of the Technical Staff at Bell Labs, Murray Hill, and then as Assistant Professor of Economics at the University of California, San Diego. A book he co-authored on Classification and Regression Trees was awarded the 1999 Nikkei Quality Control Literature Prize in Japan for excellence in statistical literature promoting the improvement of industrial quality control and management.
His consulting experience at Salford Systems has included complex modeling projects for major banks worldwide, including Citibank, Chase, American Express, Credit Suisse, and has included projects in Europe, Australia, New Zealand, Malaysia, Korea, Japan and Brazil. Steinberg led the teams that won first place awards in the KDDCup 2000, and the 2002 Duke/TeraData Churn modeling competition, and the teams that won awards in the PAKDD competitions of 2006 and 2007. He has published papers in economics, econometrics, computer science journals, and contributes actively to the ongoing research and development at Salford.