Home » Posts tagged 'Author'
Tag Archives: Author
Here is an interview with Prof Rob J Hyndman who has created many time series forecasting methods and authored books as well as R packages on the same.
Probably the biggest impact I’ve had is in helping the Australian government forecast the national health budget. In 2001 and 2002, they had underestimated health expenditure by nearly $1 billion in each year which is a lot of money to have to find, even for a national government. I was invited to assist them in developing a new forecasting method, which I did. The new method has forecast errors of the order of plus or minus $50 million which is much more manageable. The method I developed for them was the basis of the ETS models discussed in my 2008 book on exponential smoothing (www.exponentialsmoothing.net)
Here is an interview with one of the younger researchers and rock stars of the R Project, John Myles White, co-author of Machine Learning for Hackers.
Ajay- What inspired you guys to write Machine Learning for Hackers. What has been the public response to the book. Are you planning to write a second edition or a next book?
John-We decided to write Machine Learning for Hackers because there were so many people interested in learning more about Machine Learning who found the standard textbooks a little difficult to understand, either because they lacked the mathematical background expected of readers or because it wasn’t clear how to translate the mathematical definitions in those books into usable programs. Most Machine Learning books are written for audiences who will not only be using Machine Learning techniques in their applied work, but also actively inventing new Machine Learning algorithms. The amount of information needed to do both can be daunting, because, as one friend pointed out, it’s similar to insisting that everyone learn how to build a compiler before they can start to program. For most people, it’s better to let them try out programming and get a taste for it before you teach them about the nuts and bolts of compiler design. If they like programming, they can delve into the details later.
Ajay- What are the key things that a potential reader can learn from this book?
John- We cover most of the nuts and bolts of introductory statistics in our book: summary statistics, regression and classification using linear and logistic regression, PCA and k-Nearest Neighbors. We also cover topics that are less well known, but are as important: density plots vs. histograms, regularization, cross-validation, MDS, social network analysis and SVM’s. I hope a reader walks away from the book having a feel for what different basic algorithms do and why they work for some problems and not others. I also hope we do just a little to shift a future generation of modeling culture towards regularization and cross-validation.
Ajay- Describe your journey as a science student up till your Phd. What are you current research interests and what initiatives have you done with them?
John-As an undergraduate I studied math and neuroscience. I then took some time off and came back to do a Ph.D. in psychology, focusing on mathematical modeling of both the brain and behavior. There’s a rich tradition of machine learning and statistics in psychology, so I got increasingly interested in ML methods during my years as a grad student. I’m about to finish my Ph.D. this year. My research interests all fall under one heading: decision theory. I want to understand both how people make decisions (which is what psychology teaches us) and how they should make decisions (which is what statistics and ML teach us). My thesis is focused on how people make decisions when there are both short-term and long-term consequences to be considered. For non-psychologists, the classic example is probably the explore-exploit dilemma. I’ve been working to import more of the main ideas from stats and ML into psychology for modeling how real people handle that trade-off. For psychologists, the classic example is the Marshmallow experiment. Most of my research work has focused on the latter: what makes us patient and how can we measure patience?
Ajay- How can academia and private sector solve the shortage of trained data scientists (assuming there is one)?
John- There’s definitely a shortage of trained data scientists: most companies are finding it difficult to hire someone with the real chops needed to do useful work with Big Data. The skill set required to be useful at a company like Facebook or Twitter is much more advanced than many people realize, so I think it will be some time until there are undergraduates coming out with the right stuff. But there’s huge demand, so I’m sure the market will clear sooner or later.
(TIL he has played in several rock bands!)
I am just checking the nice new R package created by BigML.com co-founder Justin Donaldson. The name of the new package is bigml, which can confuse a bit since there do exist many big suffix named packages in R (including biglm)
The bigml package is available at CRAN http://cran.r-project.org/web/packages/bigml/index.html
I just tweaked the code given at http://blog.bigml.com/2012/05/10/r-you-ready-for-bigml/ to include the ssl authentication code at http://www.brocktibert.com/blog/2012/01/19/358/
so it goes
Loading required package: RJSONIO
Loading required package: RCurl
Loading required package: bitops
Loading required package: plyr
# download the file needed for authentication
# set the curl options
curl <- getCurlHandle()
options(RCurlOptions = list(capath = system.file("CurlSSL", "cacert.pem",
package = "RCurl"),
ssl.verifypeer = FALSE))
curlSetOpt(.opts = list(proxy = 'proxyserver:port'), curl = curl)
> iris.model = quickModel(iris, objective_field = ‘Species’)
Of course there are lots of goodies added here , so read the post yourself at http://blog.bigml.com/2012/05/10/r-you-ready-for-bigml/
Incidentally , the author of this R package (bigml) Justin Donalsdon who goes by name sudojudo at http://twitter.com/#!/sudojudo has also recently authored two other R packages including tsne at http://cran.r-project.org/web/packages/tsne/index.html (tsne: T-distributed Stochastic Neighbor Embedding for R (t-SNE) -A “pure R” implementation of the t-SNE algorithm) and a GUI toolbar http://cran.r-project.org/web/packages/sculpt3d/index.html (sculpt3d is a GTK+ toolbar that allows for more interactive control of a dataset inside the RGL plot window. Controls for simple brushing, highlighting, labeling, and mouseMode changes are provided by point-and-click rather than through the R terminal interface)
This along with the fact the their recently released python bindings for bigml.com was one of the top news at Hacker News- shows bigML.com is going for some traction in bringing cloud computing, better software interfaces and data mining together!
Now that you have read the basics here at http://www.decisionstats.com/how-to-learn-to-be-a-hacker-easily/ (please do read this before reading the below)
Here is a list of tutorials that you should study (in order of ease)
1) LEARN BASICS – enough to get you a job maybe if that’s all you wanted.
2) READ SOME MORE-
Lena’s Reverse Engineering Tutorial-“Use Google.com for finding the Tutorial“
Lena’s Reverse Engineering tutorial. It includes 36 parts of individual cracking techniques and will teach you the basics of protection bypassing
01. Olly + assembler + patching a basic reverseme
02. Keyfiling the reverseme + assembler
03. Basic nag removal + header problems
04. Basic + aesthetic patching
05. Comparing on changes in cond jumps, animate over/in, breakpoints
06. “The plain stupid patching method”, searching for textstrings
07. Intermediate level patching, Kanal in PEiD
08. Debugging with W32Dasm, RVA, VA and offset, using LordPE as a hexeditor
09. Explaining the Visual Basic concept, introduction to SmartCheck and configuration
10. Continued reversing techniques in VB, use of decompilers and a basic anti-anti-trick
11. Intermediate patching using Olly’s “pane window”
12. Guiding a program by multiple patching.
13. The use of API’s in software, avoiding doublechecking tricks
14. More difficult schemes and an introduction to inline patching
15. How to study behaviour in the code, continued inlining using a pointer
16. Reversing using resources
17. Insights and practice in basic (self)keygenning
18. Diversion code, encryption/decryption, selfmodifying code and polymorphism
19. Debugger detected and anti-anti-techniques
20. Packers and protectors : an introduction
21. Imports rebuilding
22. API Redirection
23. Stolen bytes
24. Patching at runtime using loaders from lena151 original
25. Continued patching at runtime & unpacking armadillo standard protection
26. Machine specific loaders, unpacking & debugging armadillo
27. tElock + advanced patching
28. Bypassing & killing server checks
29. Killing & inlining a more difficult server check
30. SFX, Run Trace & more advanced string searching
31. Delphi in Olly & DeDe
32. Author tricks, HIEW & approaches in inline patching
33. The FPU, integrity checks & loader versus patcher
34. Reversing techniques in packed software & a S&R loader for ASProtect
35. Inlining inside polymorphic code
If you want more free training – hang around this website
OWASP Cheat Sheet Series
- OWASP Top Ten Cheat Sheet
- Authentication Cheat Sheet
- Cross-Site Request Forgery (CSRF) Prevention Cheat Sheet
- Transport Layer Protection Cheat Sheet
- Cryptographic Storage Cheat Sheet
- Input Validation Cheat Sheet
- XSS Prevention Cheat Sheet
- DOM based XSS Prevention Cheat Sheet
- Forgot Password Cheat Sheet
- Query Parameterization Cheat Sheet
- SQL Injection Prevention Cheat Sheet
- Session Management Cheat Sheet
- HTML5 Security Cheat Sheet
- Web Service Security Cheat Sheet
- Application Security Architecture Cheat Sheet
- Logging Cheat Sheet
- JAAS Cheat Sheet
Draft OWASP Cheat Sheets
- Access Control Cheat Sheet
- REST Security Cheat Sheet
- Abridged XSS Prevention Cheat Sheet
- PHP Security Cheat Sheet
- Password Storage Cheat Sheet
- Secure Coding Cheat Sheet
- Threat Modeling Cheat Sheet
- Clickjacking Cheat Sheet
- Virtual Patching Cheat Sheet
- Secure SDLC Cheat Sheet
3) SPEND SOME MONEY on TRAINING
Module 1 – The x86 environment
- System Architecture
- Windows Memory Management
- Introduction to Assembly
- The stack
Module 2 – The exploit developer environment
- Setting up the exploit developer lab
- Using debuggers and debugger plugins to gather primitives
Module 3 – Saved Return Pointer Overwrite
- Saved return pointer overwrites
- Stack cookies
Module 4 – Abusing Structured Exception Handlers
- Abusing exception handler overwrites
- Bypassing Safeseh
Module 5 – Pointer smashing
- Function pointers
- Data/object pointers
- vtable/virtual functions
Module 6 – Off-by-one and integer overflows
- Integer overflows
Module 7 – Limited buffers
- Limited buffers, shellcode splitting
Module 8 – Reliability++ & reusability++
- Finding and avoiding bad characters
- Creative ways to deal with character set limitations
Module 9 – Fun with Unicode
- Exploiting Unicode based overflows
- Writing venetian alignment code
- Creating and Using venetian shellcode
Module 10 – Heap Spraying Fundamentals
- Heap Management and behaviour
- Heap Spraying for Internet Explorer 6 and 7
Module 11 – Egg Hunters
- Using and tweaking Egg hunters
- Custom egghunters
- Using Omelet egghunters
- Egghunters in a WoW64 environment
Module 12 – Shellcoding
- Building custom shellcode from scratch
- Understanding existing shellcode
- Writing portable shellcode
- Bypassing Antivirus
Module 13 – Metasploit Exploit Modules
- Writing exploits for the Metasploit Framework
- Porting exploits to the Metasploit Framework
Module 14 – ASLR
- Bypassing ASLR
Module 15 – W^X
- Bypassing NX/DEP
- Return Oriented Programming / Code Reuse (ROP) )
Module 16 – Advanced Heap Spraying
- Heap Feng Shui & heaplib
- Precise heap spraying in modern browsers (IE8 & IE9, Firefox 13)
Module 17 – Use After Free
- Exploiting Use-After-Free conditions
Module 18 – Windows 8
- Windows 8 Memory Protections and Bypass
ALSO GET CERTIFIED http://www.offensive-security.com/information-security-training/penetration-testing-with-backtrack/ ($950 cost)
the syllabus is here at
4) HANG AROUND OTHER HACKERS
or The Noir Hat Conferences-
or read this website
5) GET A DEGREE
Yes it is possible
The Johns Hopkins University Information Security Institute (JHUISI) is the University’s focal point for research and education in information security, assurance and privacy.
The Information Security Institute is now accepting applications for the Department of Defense’s Information Assurance Scholarship Program (IASP). This scholarship includes full tuition, a living stipend, books and health insurance. In return each student recipient must work for a DoD agency at a competitive salary for six months for every semester funded. The scholarship is open to American citizens only.
MASTER OF SCIENCE IN SECURITY INFORMATICS PROGRAM
The flagship educational experience offered by Johns Hopkins University in the area of information security and assurance is represented by the Master of Science in Security Informatics degree. Over thirty courses are available in support of this unique and innovative graduate program.
Disclaimer- I havent done any of these things- This is just a curated list from Quora so I am open to feedback.
You use this at your own risk of conscience ,local legal jurisdictions and your own legal liability.
A nice workshop on using R for Predictive Modeling by Max Kuhn Director, Nonclinical Statistics, Pfizer is on at PAW Toronto.
Monday, April 23, 2012 in Toronto
Full-day: 9:00am – 4:30pm
R for Predictive Modeling:
A Hands-On Introduction
Intended Audience: Practitioners who wish to learn how to execute on predictive analytics by way of the R language; anyone who wants “to turn ideas into software, quickly and faithfully.”
Knowledge Level: Either hands-on experience with predictive modeling (without R) or hands-on familiarity with any programming language (other than R) is sufficient background and preparation to participate in this workshop.
This one-day session provides a hands-on introduction to R, the well-known open-source platform for data analysis. Real examples are employed in order to methodically expose attendees to best practices driving R and its rich set of predictive modeling packages, providing hands-on experience and know-how. R is compared to other data analysis platforms, and common pitfalls in using R are addressed.
The instructor, a leading R developer and the creator of CARET, a core R package that streamlines the process for creating predictive models, will guide attendees on hands-on execution with R, covering:
- A working knowledge of the R system
- The strengths and limitations of the R language
- Preparing data with R, including splitting, resampling and variable creation
- Developing predictive models with R, including decision trees, support vector machines and ensemble methods
- Visualization: Exploratory Data Analysis (EDA), and tools that persuade
- Evaluating predictive models, including viewing lift curves, variable importance and avoiding overfitting
Hardware: Bring Your Own Laptop
Each workshop participant is required to bring their own laptop running Windows or OS X. The software used during this training program, R, is free and readily available for download.
Attendees receive an electronic copy of the course materials and related R code at the conclusion of the workshop.
- Workshop starts at 9:00am
- Morning Coffee Break at 10:30am – 11:00am
- Lunch provided at 12:30 – 1:15pm
- Afternoon Coffee Break at 2:30pm – 3:00pm
- End of the Workshop: 4:30pm
Max Kuhn, Director, Nonclinical Statistics, Pfizer
Max Kuhn is a Director of Nonclinical Statistics at Pfizer Global R&D in Connecticut. He has been applying models in the pharmaceutical industries for over 15 years.
He is a leading R developer and the author of several R packages including the CARET package that provides a simple and consistent interface to over 100 predictive models available in R.
Mr. Kuhn has taught courses on modeling within Pfizer and externally, including a class for the India Ministry of Information Technology.
Here is an interview with Prof Benjamin Alamar, founding editor of the Journal of Quantitative Analysis in Sport, a professor of sports management at Menlo College and the Director of Basketball Analytics and Research for the Oklahoma City Thunder of the NBA.
Ajay – The movie Moneyball recently sparked out mainstream interest in analytics in sports.Describe the role of analytics in sports management
Benjamin- A very typical first step for a team is to utilize the tools of predictive analytics to help inform their draft decisions.
Benjamin- I got involved in sports through a company called Protrade Sports. Protrade initially was a fantasy sports company that was looking to develop a fantasy game based on advanced sports statistics and utilize a stock market concept instead of traditional drafting. I was hired due to my background in economics to develop the market aspect of the game.
There I met Roland Beech (who now works for the Mavericks) and Aaron Schatz (owner of footballoutsiders.com) and learned about the developing field of sports statistics. I then changed my research focus from economics to sports statistics and founded the Journal of Quantitative Analysis in Sports. Through the journal and my published research, I was able to establish a reputation of doing quality, useable work.
For students, I recommend developing very strong data management skills (sql and the like) and thinking carefully about what sort of questions a general manager or coach would care about. Being able to demonstrate analytic skills around actionable research will generally attract the attention of pro teams.
Benjamin Alamar, Professor of Sport Management, Menlo College
Professor Benjamin Alamar is the founding editor of the Journal of Quantitative Analysis in Sport, a professor of sports management at Menlo College and the Director of Basketball Analytics and Research for the Oklahoma City Thunder of the NBA. He has published academic research in football, basketball and baseball, has presented at numerous conferences on sports analytics. He is also a co-creator of ESPN’s Total Quarterback Rating and a regular contributor to the Wall Street Journal. He has consulted for teams in the NBA and NFL, provided statistical analysis for author Michael Lewis for his recent book The Blind Side, and worked with numerous startup companies in the field of sports analytics. Professor Alamar is also an award winning economist who has worked academically and professionally in intellectual property valuation, public finance and public health. He received his PhD in economics from the University of California at Santa Barbara in 2001.
Prof Alamar is a speaker at Predictive Analytics World, San Fransisco and is doing a workshop there
Track 1: Sports Analytics
Case Study: NFL, MLB, & NBA
Competing & Winning with Sports Analytics
The field of sports analytics ties together the tools of data management, predictive modeling and information systems to provide sports organization a competitive advantage. The field is rapidly developing based on new and expanded data sources, greater recognition of the value, and past success of a variety of sports organizations. Teams in the NFL, MLB, NBA, as well as other organizations have found a competitive edge with the application of sports analytics. The future of sports analytics can be seen through drawing on these past successes and the developments of new tools.
You can know more about Prof Alamar at his blog http://analyticfootball.blogspot.in/ or journal at http://www.degruyter.com/view/j/jqas. His detailed background can be seen at http://menlo.academia.edu/BenjaminAlamar/CurriculumVitae