Home » Posts tagged 'quality'
Tag Archives: quality
UPDATED POST- Some Models I use for Business Strategy- to analyze the huge reams of qualitative and uncertain data that business generates. I have added a bonus the Business canvas Model (number 2)
- Porters 5 forces Model-To analyze industries
- Business Canvas
- BCG Matrix- To analyze Product Portfolios
- Porters Diamond Model- To analyze locations
- McKinsey 7 S Model-To analyze teams
- Gernier Theory- To analyze growth of organization
- Herzberg Hygiene Theory- To analyze soft aspects of individuals
- Marketing Mix Model- To analyze marketing mix.
Many Data Quality Formats give problems when importing in your statistical software.A statistical software is quite unable to distingush between $1,000, 1000% and 1,000 and 1000 and will treat the former three as character variables while the third as a numeric variable by default. This issue is further compounded by the numerous ways we can represent date-time variables.
The good thing is for specific domains like finance and web analytics, even these weird data input formats are fixed, so we can fix up a list of handy data quality conversion functions in R for reference.
After much muddling about with coverting internet formats (or data used in web analytics) (mostly time formats without date like 00:35:23) into data frame numeric formats, I found that the way to handle Date-Time conversions in R is
The problem with this approach is you will get the value as a Date Time format (02/31/2012 04:00:45- By default R will add today’s date to it.) while you are interested in only Time Durations (4:00:45 or actually just the equivalent in seconds).
this can be handled using the as.difftime function
or to get purely numeric values so we can do numeric analysis (like summary)
(#Maybe there is a more elegant way here- but I dont know)
The kind of data is usually one we get in web analytics for average time on site , etc.
for factor variables
Slight problem is suppose there is data like 1,504 – it will be converted to NA instead of 1504
The way to solve this is use the nice gsub function ONLy on that variable. Since the comma is also the most commonly used delimiter , you dont want to replace all the commas, just only the one in that variable.
Now lets assume we have data in the form of % like 0.00% , 1.23%, 3.5%
again we use the gsub function to replace the % value in the string with (nothing).
If you simply do the following for a factor variable, it will show you the level not the value. This can create an error when you are reading in CSV data which may be read as character or factor data type.
An additional way is to use substr (using substr( and concatenate (using paste) for manipulating string /character variables.
iris$sp=substr(iris$Species,1,3) –will reduce the famous Iris species into three digits , without losing any analytical value.
The other issue is with missing values, and na.rm=T helps with getting summaries of numeric variables with missing values, we need to further investigate how suitable, na.omit functions are for domains which have large amounts of missing data and need to be treated.
JMP , the visual data exploration, statistical quality control software from SAS Institute launched version 10 of its software today.
JMP 10 includes:
Numerous enhancements to the drag-and-drop Graph Builder, including a new iPad application.
A cutting-edge Control Chart Builder to create process control charts with drag-and-drop ease.
New reliability capabilities, including growth and forecast models.
Additions and improvements for sorting and filtering data, design of experiments, statistical modeling, scripting, add-in and application development, script debugging and more.
From JohnSall’s blog post at http://blogs.sas.com/content/jmp/2012/03/20/discover-more-with-jmp-10/
Much of the development centered on four focus areas:
1. Graph Builder everywhere. The Graph Builder platform itself has new features like Heatmap and Treemap, an elements palette and properties panel, making the choices more visible. But Graph Builder also has some descendents now, including the new Control Chart Builder, which makes creating control charts an interactive process. In addition, some of the drag-and-drop features that are used to change columns in Graph Builder are also available in Distribution, Fit Y by X, and a few other places. Finally, Graph Builder has been ported to the iPad. For the first time, you can use JMP for exploration and presentation on a mobile device for free. So just think of Graph Builder as gradually taking over in lots of places.
2. Expert-driven design.reliability, measurement systems, and partial least squares analyses.
3. Performance. this release has the most new multithreading so far
4. Application development
You can read more here -http://jmp.com/about/events/webcasts/jmpwebcast_detail.shtml?reglink=70130000001r9IP
In part 3 of the series for predictions for 2012, here is Jill Dyche, Baseline Consulting/DataFlux.
Part 2 was Timo Elliot, SAP at http://www.decisionstats.com/timo-elliott-on-2012/ and Part 1 was Jim Kobielus, Forrester at http://www.decisionstats.com/jim-kobielus-on-2012/
Ajay: What are the top trends you saw happening in 2011?
Well, I hate to say I saw them coming, but I did. A lot of managers committed some pretty predictable mistakes in 2011. Here are a few we witnessed in 2011 live and up close:
1. In the spirit of “size matters,” data warehouse teams continued to trumpet the volumes of stored data on their enterprise data warehouses. But a peek under the covers of these warehouses reveals that the data isn’t integrated. Essentially this means a variety of heterogeneous virtual data marts co-located on a single server. Neat. Big. Maybe even worthy of a magazine article about how many petabytes you’ve got. But it’s not efficient, and hardly the example of data standardization and re-use that everyone expects from analytical platforms these days.
2. Development teams still didn’t factor data integration and provisioning into their project plans in 2011. So we saw multiple projects spawn duplicate efforts around data profiling, cleansing, and standardization, not to mention conflicting policies and business rules for the same information. Bummer, since IT managers should know better by now. The problem is that no one owns the problem. Which brings me to the next mistake…
3. No one’s accountable for data governance. Yeah, there’s a council. And they meet. And they talk. Sometimes there’s lunch. And then nothing happens because no one’s really rewarded—or penalized for that matter—on data quality improvements or new policies. And so the reports spewing from the data mart are still fraught and no one trusts the resulting decisions.
But all is not lost since we’re seeing some encouraging signs already in 2012. And yes, I’d classify some of them as bona-fide trends.
Ajay: What are some of those trends?
Job descriptions for data stewards, data architects, Chief Data Officers, and other information-enabling roles are becoming crisper, and the KPIs for these roles are becoming more specific. Data management organizations are being divorced from specific lines of business and from IT, becoming specialty organizations—okay, COEs if you must—in their own rights. The value proposition for master data management now includes not just the reconciliation of heterogeneous data elements but the support of key business strategies. And C-level executives are holding the data people accountable for improving speed to market and driving down costs—not just delivering cleaner data. In short, data is becoming a business enabler. Which, I have to just say editorially, is better late than never!
Ajay: Anything surprise you, Jill?
I have to say that Obama mentioning data management in his State of the Union speech was an unexpected but pretty powerful endorsement of the importance of information in both the private and public sector.
I’m also sort of surprised that data governance isn’t being driven more frequently by the need for internal and external privacy policies. Our clients are constantly asking us about how to tightly-couple privacy policies into their applications and data sources. The need to protect PCI data and other highly-sensitive data elements has made executives twitchy. But they’re still not linking that need to data governance.
I should also mention that I’ve been impressed with the people who call me who’ve had their “aha!” moment and realize that data transcends analytic systems. It’s operational, it’s pervasive, and it’s dynamic. I figured this epiphany would happen in a few years once data quality tools became a commodity (they’re far from it). But it’s happening now. And that’s good for all types of businesses.
Jill Dyché has written three books and numerous articles on the business value of information technology. She advises clients and executive teams on leveraging technology and information to enable strategic business initiatives. Last year her company Baseline Consulting was acquired by DataFlux Corporation, where she is currently Vice President of Thought Leadership. Find her blog posts on www.dataroundtable.com.