Baixe o app para aproveitar ainda mais
Prévia do material em texto
Modelagem Preditiva > library(readxl) > DIRTYSHOPCSV <- read_excel("C:/Users/Gustavo Giusti/Desktop/DIRTYSHOPCSV.xlsx") > View(DIRTYSHOPCSV) > drt=DIRTYSHOPCSV > attach(drt) > summary(IDADE) Min. 1st Qu. Median Mean 3rd Qu. Max. NA's 0.0 31.0 40.0 42.4 53.0 89.0 3 > library(psych) > describe(IDADE) vars n mean sd median trimmed mad min max range skew kurtosis se X1 1 2797 42.4 14.2 40 41.5 16.31 0 89 89 0.54 -0.28 0.27 A data.frame of the relevant statistics: item name item number number of valid cases mean standard deviation trimmed mean (with trim defaulting to .1) median (standard or interpolated mad: median absolute deviation (from the median) minimum maximum skew kurtosis standard error > describeBy(IDADE,STATUS) Descriptive statistics by group group: bom vars n mean sd median trimmed mad min max range skew kurtosis se X1 1 1946 42.55 14.25 41 41.64 16.31 18 89 71 0.53 -0.36 0.32 --------------------------------------------------------------------------------------------------------- group: mau vars n mean sd median trimmed mad min max range skew kurtosis se X1 1 851 42.06 14.08 40 41.17 14.83 0 89 89 0.57 -0.11 0.48 Árvores de Decisão Abrir e fixar arquivo > library(readxl) > TEBA <- read_excel("~/Parcial/TEBA.xlsx") > View(TEBA) > teba=TEBA > attach(teba) Packages necessários > library(rpart) > library(caret) > library(partykit) > library(rattle) > library(rpart.plot) > library(RColorBrewer) > library(gmodels) Gerando arquivos de aprendizado e teste > set.seed(1234) > index=createDataPartition(cancel,p=0.6,list=F) > teba.learn=teba[index,] > teba.test=teba[-index,] Construindo a árvore de decisão > ad1=rpart(data=teba.learn,cancel~idade+linhas+temp_cli+renda+fatura+temp_rsd+local+tvcabo+debaut,method="class") Gráficos > prp(ad1,type=2,extra=104,nn=T,fallen.leaves=T,branch.col="red",branch.lty=5,box.col=c("white",'green')) > fancyRpartPlot(ad1) Nó intermediário (39% dos dados) 59% de cancel=nao 41% de cancel = sim Split: variável “temp_cli” Nó inicial (100% dos dados) 76% de cancel=nao 24% de cancel = sim Split: variável “local” > fancyRpartPlot(ad1,palettes=c("Greys",'Blues')) Regras de classificação > ad1Tem 727 clientes Classifica todos “nao” 86,93% de “não” 13,07% de “sim” 95 mal classificados n= 1201 node), split, n, loss, yval, (yprob) * denotes terminal node 1) root 1201 287 nao (0.76103247 0.23896753) 2) local=A,C 727 95 nao (0.86932600 0.13067400) 4) temp_cli>=16.5 517 37 nao (0.92843327 0.07156673) * 5) temp_cli< 16.5 210 58 nao (0.72380952 0.27619048) Nó k gera nós 2k e 2k+1 * Nó terminal (folha) 10) fatura< 497.5 108 13 nao (0.87962963 0.12037037) * 11) fatura>=497.5 102 45 nao (0.55882353 0.44117647) 22) tvcabo=nao 37 10 nao (0.72972973 0.27027027) * 23) tvcabo=sim 65 30 sim (0.46153846 0.53846154) 46) fatura>=809.5 26 8 nao (0.69230769 0.30769231) * 47) fatura< 809.5 39 12 sim (0.30769231 0.69230769) * 3) local=B,D 474 192 nao (0.59493671 0.40506329) 6) temp_cli>=18.5 221 52 nao (0.76470588 0.23529412) 12) fatura< 1075 157 20 nao (0.87261146 0.12738854) * 13) fatura>=1075 64 32 nao (0.50000000 0.50000000) 26) temp_cli>=31.5 25 5 nao (0.80000000 0.20000000) * 27) temp_cli< 31.5 39 12 sim (0.30769231 0.69230769) * 7) temp_cli< 18.5 253 113 sim (0.44664032 0.55335968) 14) fatura< 466.5 108 34 nao (0.68518519 0.31481481) 28) idade>=28.5 81 19 nao (0.76543210 0.23456790) * 29) idade< 28.5 27 12 sim (0.44444444 0.55555556) 58) linhas< 1.5 17 6 nao (0.64705882 0.35294118) * 59) linhas>=1.5 10 1 sim (0.10000000 0.90000000) * 15) fatura>=466.5 145 39 sim (0.26896552 0.73103448) * Estimação das probabilidades > phat.test=predict(ad1,newdata=teba.test,type="prob") Data frame com probabilidades de não e sim (ordem alfabética) > yhat.test=predict(ad1,newdata=teba.test,type="class") Onde foi classificado (nao ou sim) > phat.test nao sim 1 0.8796296 0.12037037 2 0.7654321 0.23456790 3 0.2689655 0.73103448 4 0.9284333 0.07156673 5 0.8726115 0.12738854 6 0.9284333 0.07156673 7 0.8796296 0.12037037 8 0.3076923 0.69230769 9 0.9284333 0.07156673 10 0.9284333 0.07156673 Matriz de classificação > CrossTable(teba.test$cancel,yhat.test) Cell Contents |-------------------------| | N | | Chi-square contribution | | N / Row Total | | N / Col Total | | N / Table Total | |-------------------------| Total Observations in Table: 799 | yhat.test teba.test$cancel | nao | sim | Row Total | -----------------|-----------|-----------|-----------| nao | 534 | 75 | 609 | | 6.397 | 23.494 | | | 0.877 | 0.123 | 0.762 | | 0.850 | 0.439 | | | 0.668 | 0.094 | | -----------------|-----------|-----------|-----------| sim | 94 | 96 | 190 | | 20.505 | 75.305 | | | 0.495 | 0.505 | 0.238 | | 0.150 | 0.561 | | | 0.118 | 0.120 | | -----------------|-----------|-----------|-----------| Column Total | 628 | 171 | 799 | | 0.786 | 0.214 | | -----------------|-----------|-----------|-----------| Podando a árvore > printcp(ad1) Classification tree: rpart(formula = cancel ~ idade + linhas + temp_cli + renda + fatura + temp_rsd + local + tvcabo + debaut, data = teba.learn, method = "class") Variables actually used in tree construction: [1] fatura idade linhas local temp_cli tvcabo Última linha Resubs erro r= 0,239 * 0,634 Cv error = 0,819 * 0,239 Root node error: 287/1201 = 0.23897 n= 1201 CP nsplit rel error xerror xstdComplexity Parameter (CP) é obtido via cross validation Cross-over error (xerror): média dos erros Podar utilizando CP que corresponde ao menor erro (xerror) calculado via cross validation 1 0.077816 0 1.00000 1.00000 0.051494 2 0.026132 3 0.76655 0.82927 0.048134 3 0.013937 5 0.71429 0.81882 0.047904 4 0.013066 7 0.68641 0.81882 0.047904 5 0.010000 11 0.63415 0.81882 0.047904 > ad2=prune(ad1,cp=ad1$cptable[which.min(ad1$cptable[,"xerror"]),"CP"]) > ad2 n= 1201 node), split, n, loss, yval, (yprob) * denotes terminal node 1) root 1201 287 nao (0.7610325 0.2389675) 2) local=A,C 727 95 nao (0.8693260 0.1306740) * 3) local=B,D 474 192 nao (0.5949367 0.4050633) 6) temp_cli>=18.5 221 52 nao (0.7647059 0.2352941) 12) fatura< 1075 157 20 nao (0.8726115 0.1273885) * 13) fatura>=1075 64 32 nao (0.5000000 0.5000000) 26) temp_cli>=31.5 25 5 nao (0.8000000 0.2000000) * 27) temp_cli< 31.5 39 12 sim (0.3076923 0.6923077) * 7) temp_cli< 18.5 253 113 sim (0.4466403 0.5533597) 14) fatura< 466.5 108 34 nao (0.6851852 0.3148148) * 15) fatura>=466.5 145 39 sim (0.2689655 0.7310345) * > prp(ad2,type=2,extra=104,nn=T,fallen.leaves=T,branch.col="red",branch.lty=5,box.col=c("white",'green')) Regressão Logística Leitura do arquivo > library(readxl) > VENDEDORES <- read_excel("C:/Users/Gustavo Giusti/OneDrive/FGV/2017.2/Métodos Multivariados em Administração/Regressão Logística/VENDEDORES.xlsx") > View(VENDEDORES) > vendedorescsv=VENDEDORES > attach(vendedorescsv) Box-plots > par(mfrow=c(1,3))> boxplot(IDADE~DESEMP,xlab="IDADE") > boxplot(ENTREV~DESEMP,xlab="ENTREV") > boxplot(EXPER~DESEMP,xlab="EXPER") Discretização de variáveis em classes de mesma frequência > library(arules) > kidade=discretize(IDADE,method='frequency',categories=3) Tabelas de contingência > library(gmodels) > CrossTable(SEX,DESEMP) Cell Contents |-------------------------| | N | | Chi-square contribution | | N / Row Total | | N / Col Total | | N / Table Total | |-------------------------| Total Observations in Table: 59 | DESEMP SEX | B | M | Row Total | -------------|-----------|-----------|-----------| FEM | 12 | 10 | 22 | | 0.036 | 0.049 | | | 0.545 | 0.455 | 0.373 | | 0.353 | 0.400 | | | 0.203 | 0.169 | | -------------|-----------|-----------|-----------| MASC | 22 | 15 | 37 | | 0.022 | 0.029 | | | 0.595 | 0.405 | 0.627 | | 0.647 | 0.600 | | | 0.373 | 0.254 | | -------------|-----------|-----------|-----------| Column Total | 34 | 25 | 59 | | 0.576 | 0.424 | | -------------|-----------|-----------|-----------| Verificação de multicolinearidade > library(car) > fit=lm(DESEMP.B~IDADE+ENTREV+EXPER,data=vendedorescsv) > vif(fit) IDADE ENTREV EXPER 1.336334 1.066761 1.261515 Geração de variáveis dummy > DESEMP.B=ifelse(DESEMP=="B",1,0) Geração do modelo de regressão logística > mod1=glm(DESEMP.B~ENTREV+EXPER+IDADE+SEX,family=binomial()) > summary(mod1) Call: glm(formula = DESEMP.B ~ ENTREV + EXPER + IDADE + SEX, family = binomial()) Deviance Residuals: Min 1Q Median 3Q Max -2.0127 -0.7034 0.1472 0.5926 2.1905 Estimate: o quanto aumenta Z para um aumento unitário de Xi Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -14.33466 3.86803 -3.706 0.000211 *** ENTREV 1.18507 0.34739 3.411 0.000646 *** EXPER 0.02626 0.10904 0.241 0.809702 IDADE 0.13983 0.06183 2.261 0.023734 * SEXMASC 0.56778 0.72330 0.785 0.432465 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 80.413 on 58 degrees of freedom Residual deviance: 52.081 on 54 degrees of freedom AIC: 62.081 Number of Fisher Scoring iterations: 5 Estatísticas para cálculo do LRT > mod2=glm(I(DESEMP=="B")~1,family=binomial()) > logLik(mod2) 'log Lik.' -40.20656 (df=1) > logLik(mod1) 'log Lik.' -26.04046 (df=5) Estimativa das probabilidades > pbom=predict(mod1,type="response") Teste de ajuste do modelo > kp=discretize(pbom,method='frequency',categories=5) > table(kp,DESEMP) DESEMP kp B M [0.0623,0.223) 2 10 [0.2233,0.493) 4 8 [0.4934,0.799) 7 5 [0.7985,0.925) 10 2 [0.9251,0.997] 11 0 Teste de Hosmer & Lemeshow > library(ResourceSelection) > h1=hoslem.test(mod1$y,fitted(mod1),g=10) Teste de Spiegelhalter Cluster Analysis Leitura do arquivo > library(readxl) > GRAMPERS <- read_excel("C:/Users/Gustavo Giusti/OneDrive/FGV/2017.2/Métodos Multivariados em Administração/Cluster Analysis/GRAMPERS.xlsx") > View(GRAMPERS) Eliminar casos com missing values > GRAMPERS=na.omit(GRAMPERS) Selecionar colunas numéricas do arquivo e gerar arquivo gnum > nums<-sapply(GRAMPERS,is.numeric) > nums CLIENTE DESPESAS MAX_INT ITENS FREQUÊNCIA IDADE SEXO DEPENDENTES REGIAO OCUPAÇÃO TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE FALSE FALSE EST_CIVIL FALSE > gnum=GRAMPERS[,nums] > colnames(gnum) [1] "CLIENTE" "DESPESAS" "MAX_INT" "ITENS" "FREQUÊNCIA" "IDADE" "DEPENDENTES" Gerar arquivo gdrive somente com as drivers > gdrive=gnum[,-c(1,6,7)] > colnames(gdrive) [1] "DESPESAS" "MAX_INT" "ITENS" "FREQUÊNCIA" Padronizar as variáveis > gdrive=scale(gdrive) > gdrive=as.data.frame(gdrive) Obter a matriz de distâncias > gdist=dist(gdrive,method="euclidean") > gg=as.matrix(gdist) > gg 1 2 3 4 5 6 7 8 9 10 11 12 13 1 0.0000000 3.315140 3.1906675 2.794454 2.244746 2.0869780 3.3442770 3.793171 3.9408053 5.215251 2.8878759 4.2864815 0.6677304 2 3.3151398 0.000000 3.7569818 2.232808 1.416499 1.4308177 2.4397616 2.018264 1.8481713 2.354323 2.0850804 2.3437745 2.9935199 3 3.1906675 3.756982 0.0000000 2.762449 2.591702 3.2768254 1.9960095 4.365486 2.8327340 5.311192 4.9256301 2.8798425 3.5661716 4 2.7944543 2.232808 2.7624492 0.000000 1.270377 2.4015510 2.8520687 1.705414 2.7624492 4.506708 3.3826726 3.2030928 2.6019549 5 2.2447464 1.416499 2.5917022 1.270377 0.000000 1.1901080 1.9802160 2.172556 2.0105040 3.591555 2.5362685 2.4720551 2.0855550 6 2.0869780 1.430818 3.2768254 2.401551 1.190108 0.0000000 2.2002611 2.847638 2.3212060 3.133717 1.7036997 2.7307382 1.8828033 7 3.3442770 2.439762 1.9960095 2.852069 1.980216 2.2002611 0.0000000 3.866735 1.1442290 3.404015 3.8379817 1.1563952 3.5108024 8 3.7931706 2.018264 4.3654860 1.705414 2.172556 2.8476383 3.8667354 0.000000 3.3216090 4.084229 3.0038360 3.8221769 3.3340052 9 3.9408053 1.848171 2.8327340 2.762449 2.010504 2.3212060 1.1442290 3.321609 0.0000000 2.609676 3.6827512 0.5187595 3.9233116 10 5.2152508 2.354323 5.3111920 4.506708 3.591555 3.1337165 3.4040155 4.084229 2.6096760 0.000000 3.3419531 2.7522153 4.9702212 11 2.8878759 2.085080 4.9256301 3.382673 2.536269 1.7036997 3.8379817 3.003836 3.6827512 3.341953 0.0000000 4.1211174 2.3739764 12 4.2864815 2.343775 2.8798425 3.203093 2.472055 2.7307382 1.1563952 3.822177 0.5187595 2.752215 4.1211174 0.0000000 4.3245086 13 0.6677304 2.993520 3.5661716 2.601955 2.085555 1.8828033 3.5108024 3.334005 3.9233116 4.970221 2.3739764 4.3245086 0.0000000 14 3.3361082 2.118810 2.5401883 2.999178 1.951871 1.8863378 0.5983850 3.789617 1.0502231 2.886855 3.4311939 1.1377958 3.4325908 15 2.4303801 1.535127 4.2095593 2.591344 1.754774 1.1288724 3.2478972 2.415307 3.1138414 3.312619 0.8019294 3.5816854 1.9296998 16 2.9184525 3.049863 1.2324506 2.863859 2.145502 2.4561684 0.9638671 4.199277 2.0478595 4.301069 4.1584917 2.0755323 3.2446945 17 5.9203264 3.346544 5.8019976 5.237742 4.386841 3.9939783 3.9848629 4.904809 3.2969202 1.847526 4.2670713 3.3303805 5.7260934 18 2.7575657 1.233479 3.3631174 2.726487 1.488499 0.7378547 1.8643823 3.068412 1.8127816 2.489945 2.0281451 2.1849096 2.6122974 19 3.7833911 2.311095 2.4104830 2.975050 2.152996 2.3917646 0.5645801 3.798252 0.6788524 3.032959 3.9213354 0.5924899 3.8788836 20 2.7684097 2.307061 3.1684116 3.415208 2.165935 1.4058609 1.6832825 4.079629 2.1654663 3.023732 2.7449528 2.3315885 2.8587154 21 4.3121322 1.854434 3.3065694 3.046474 2.323487 2.5512178 1.5905381 3.373205 0.4839687 2.247768 3.7441016 0.6752578 4.2454055 22 1.5732675 3.321555 2.1707770 3.073767 2.248138 2.1088173 2.2421383 4.345228 3.1720436 4.827254 3.5454036 3.3576696 2.0786052 23 3.2984318 2.784995 1.5180288 2.813003 2.093145 2.4640554 0.4999512 4.014804 1.5180288 3.895826 4.1475538 1.5030643 3.5335664 24 3.6924459 2.968369 1.5069820 2.870955 2.327483 2.8427724 0.8097700 4.074409 1.5069820 4.030140 4.5013590 1.4164000 3.9112507 25 3.4241138 3.841794 0.2612237 2.809171 2.711324 3.4478280 2.0699772 4.404391 2.8447530 5.371991 5.0958588 2.8714933 3.7804899 14 15 16 17 18 19 20 21 22 23 24 251 3.3361082 2.4303801 2.9184525 5.920326 2.7575657 3.7833911 2.7684097 4.3121322 1.5732675 3.2984318 3.6924459 3.4241138 2 2.1188100 1.5351273 3.0498633 3.346544 1.2334785 2.3110946 2.3070611 1.8544337 3.3215545 2.7849953 2.9683695 3.8417939 3 2.5401883 4.2095593 1.2324506 5.801998 3.3631174 2.4104830 3.1684116 3.3065694 2.1707770 1.5180288 1.5069820 0.2612237 4 2.9991784 2.5913439 2.8638595 5.237742 2.7264869 2.9750495 3.4152081 3.0464736 3.0737673 2.8130033 2.8709549 2.8091707 5 1.9518707 1.7547737 2.1455020 4.386841 1.4884986 2.1529961 2.1659349 2.3234870 2.2481385 2.0931449 2.3274833 2.7113242 6 1.8863378 1.1288724 2.4561684 3.993978 0.7378547 2.3917646 1.4058609 2.5512178 2.1088173 2.4640554 2.8427724 3.4478280 7 0.5983850 3.2478972 0.9638671 3.984863 1.8643823 0.5645801 1.6832825 1.5905381 2.2421383 0.4999512 0.8097700 2.0699772 8 3.7896171 2.4153066 4.1992767 4.904809 3.0684125 3.7982524 4.0796290 3.3732054 4.3452283 4.0148042 4.0744092 4.4043906 9 1.0502231 3.1138414 2.0478595 3.296920 1.8127816 0.6788524 2.1654663 0.4839687 3.1720436 1.5180288 1.5069820 2.8447530 10 2.8868547 3.3126191 4.3010694 1.847526 2.4899445 3.0329585 3.0237320 2.2477677 4.8272536 3.8958262 4.0301399 5.3719906 11 3.4311939 0.8019294 4.1584917 4.267071 2.0281451 3.9213354 2.7449528 3.7441016 3.5454036 4.1475538 4.5013590 5.0958588 12 1.1377958 3.5816854 2.0755323 3.330380 2.1849096 0.5924899 2.3315885 0.6752578 3.3576696 1.5030643 1.4164000 2.8714933 13 3.4325908 1.9296998 3.2446945 5.726093 2.6122974 3.8788836 2.8587154 4.2454055 2.0786052 3.5335664 3.9112507 3.7804899 14 0.0000000 2.9179202 1.4270330 3.544567 1.4078367 0.6772465 1.2480686 1.3985616 2.3434870 1.0838598 1.3940373 2.6354098 15 2.9179202 0.0000000 3.5314372 4.236057 1.5865093 3.3526148 2.4277527 3.2341111 3.0366373 3.5159501 3.8447680 4.3689442 16 1.4270330 3.5314372 0.0000000 4.831761 2.3838397 1.5030643 2.0071867 2.5193380 1.6228634 0.5924899 0.9735369 1.3859218 17 3.5445666 4.2360575 4.8317613 0.000000 3.3928637 3.6213738 3.7353150 2.9714956 5.4321489 4.4300582 4.5233657 5.8473402 18 1.4078367 1.5865093 2.3838397 3.392864 0.0000000 1.9266080 1.0941339 1.9731033 2.4345938 2.2465787 2.5965031 3.5062357 19 0.6772465 3.3526148 1.5030643 3.621374 1.9266080 0.0000000 1.9253151 1.0815246 2.7784389 0.9553319 1.0064341 2.4391456 20 1.2480686 2.4277527 2.0071867 3.735315 1.0941339 1.9253151 0.0000000 2.4099586 1.9466269 2.0151123 2.4618927 3.3442055 21 1.3985616 3.2341111 2.5193380 2.971496 1.9731033 1.0815246 2.4099586 0.0000000 3.6022310 1.9838735 1.9486694 3.3084797 22 2.3434870 3.0366373 1.6228634 5.432149 2.4345938 2.7784389 1.9466269 3.6022310 0.0000000 2.1190377 2.5588758 2.4264129 23 1.0838598 3.5159501 0.5924899 4.430058 2.2465787 0.9553319 2.0151123 1.9838735 2.1190377 0.0000000 0.4934225 1.5872096 24 1.3940373 3.8447680 0.9735369 4.523366 2.5965031 1.0064341 2.4618927 1.9486694 2.5588758 0.4934225 0.0000000 1.4926587 25 2.6354098 4.3689442 1.3859218 5.847340 3.5062357 2.4391456 3.3442055 3.3084797 2.4264129 1.5872096 1.4926587 0.0000000 26 27 28 29 30 31 32 33 34 35 36 37 38 1 2.0418719 2.2177677 2.7985698 3.9546999 3.0849435 3.9304270 3.0216257 3.1402454 5.486003 0.9633734 4.966544 4.625804 3.6078128 2 3.1966572 2.1335270 1.9014194 1.8597836 1.5169662 2.4032431 2.1188100 2.1082947 2.688574 3.0031844 2.851546 1.701016 2.5596094 3 2.4099586 2.6489685 3.6444563 2.8328000 3.1697137 2.4076271 2.7808395 3.4678089 4.764877 2.7695034 5.060940 4.300696 1.9477085 4 3.3714466 2.8165036 3.3470582 2.7650909 2.9339525 2.9977580 3.1382711 3.5211480 4.390057 1.9291794 4.651768 3.187319 2.8647908 5 2.3620517 1.6048119 2.0809907 2.0209986 1.7270249 2.2498775 1.9576170 2.2686018 3.623451 1.7422046 3.677448 2.640187 2.1228673 6 1.9979508 1.0707859 1.0727510 2.3384818 1.2334785 2.5589249 1.5604532 1.4883111 3.520186 2.1100598 3.161659 2.785683 2.4764868 7 1.9640184 1.4882445 2.1274361 1.1518931 1.4373774 0.6970186 1.0591878 1.7721315 2.926712 3.1404426 3.299602 2.817192 0.3579732 8 4.5083945 3.6652886 3.7275812 3.3247074 3.4014294 3.8242982 3.8775187 4.0095832 4.277158 3.1117710 4.556339 2.883427 3.8903986 9 2.9125065 2.0838043 2.2929632 0.0193499 1.4222177 0.6686409 1.5438525 2.0003291 2.031260 3.6082872 2.858183 1.892447 1.0577214 10 4.4030183 3.3877806 2.6392611 2.6179842 2.3808273 3.1043821 2.8849086 2.5464450 1.506946 5.1538006 1.975007 2.080133 3.4919066 11 3.4638705 2.6740562 2.1082947 3.6992354 2.5824317 4.0788336 3.0682656 2.6700017 4.248579 3.0577634 3.580548 3.401806 4.0968138 12 3.0458363 2.3319666 2.5587634 0.5107585 1.7101957 0.4968524 1.6832825 2.1671889 1.982449 3.9910271 2.940238 2.189391 0.9937048 13 2.4499886 2.3359909 2.7254740 3.9375532 3.0354877 4.0253614 3.1170320 3.1522981 5.352163 1.0161133 4.819727 4.383707 3.7682115 14 1.9597823 1.2386682 1.6228634 1.0652672 0.9022256 0.8804205 0.5804970 1.2264802 2.577419 3.2290436 2.836207 2.544640 0.8656938 15 3.0262505 2.1910251 1.8843979 3.1293143 2.1554883 3.4980902 2.6283079 2.4048719 3.983398 2.4136268 3.522081 3.007018 3.4840385 16 1.5286422 1.6116018 2.5540322 2.0539753 2.0976479 1.5826922 1.6048633 2.2961570 3.871752 2.7424989 4.043520 3.614087 1.0934896 17 4.9943155 4.1065848 3.4770418 3.3021401 3.2230555 3.6685617 3.5808241 3.3177650 1.623936 5.8703816 1.103375 2.209982 4.0396759 18 2.1489172 1.0594898 0.7163371 1.8314806 0.5725396 2.0999146 1.1087933 0.9569850 2.838079 2.7573094 2.609950 2.341128 2.1305344 19 2.4767037 1.8461485 2.2577335 0.6829765 1.4485396 0.2031740 1.2577436 1.8698024 2.433585 3.5314372 3.060660 2.455532 0.4745941 20 1.4203597 0.6955385 0.7045640 2.1838887 0.9220824 2.1284891 0.6675716 0.4861878 3.220911 3.0103567 2.829814 3.114357 2.0293721 21 3.3108004 2.4214562 2.4453958 0.4800849 1.5998895 1.0548413 1.8330425 2.1562612 1.609019 3.9881217 2.672528 1.642588 1.5043709 22 0.6669589 1.4402546 2.3670266 3.1850012 2.4612464 2.9269333 2.0766590 2.4129670 4.819604 1.8793244 4.464398 4.332448 2.5206459 23 1.9447922 1.7115713 2.5050753 1.5217241 1.8785743 1.0064341 1.4592874 2.1852450 3.361642 3.0464670 3.732269 3.182426 0.5102366 24 2.4264129 2.1746755 2.9162964 1.5043709 2.2133985 0.9553319 1.8630534 2.5808839 3.348124 3.3473598 3.915171 3.199630 0.5593332 25 2.6473345 2.8382680 3.8142171 2.8430414 3.2952020 2.4144407 2.9204753 3.6207625 4.768853 2.9621935 5.141940 4.317412 1.9766887 39 40 1 2.7928897 4.0034071 2 3.4145581 1.9159496 3 1.0059746 3.4118161 4 2.9911153 3.4038894 5 2.3723018 2.3794919 6 2.6970456 2.1746043 7 1.4498092 1.4402546 8 4.4362662 3.7829545 9 2.5222158 0.9702130 10 4.7579080 1.9877387 11 4.3793637 3.3744707 12 2.5548676 1.0419021 13 3.2005047 3.9750583 14 1.8893813 0.9502232 15 3.7498719 2.9802423 16 0.4861878 2.3615829 17 5.2698053 2.7688728 18 2.7283544 1.4809060 19 1.9874095 1.1033747 20 2.2961570 1.6511358 21 2.9975688 0.9381200 22 1.4424108 3.1998833 23 1.0565712 1.9358746 24 1.3519041 2.1002795 25 1.2142840 3.4753009 [ reached getOption("max.print") -- omitted 15 rows ] Rodar a análise hierárquica com method=ward > hclust1=hclust(gdist,method="ward.D") > plot(hclust1) > plot(hclust1,xlab="",sub="") Determinar visualmente o número de grupos > abline(h=10) > rect.hclust(hclust1,k=3,border="red") Criar uma variável com os nomes dos grupos > gclust=cutree(hclust1,k=3) > gfinal=as.data.frame(cbind(GRAMPERS,gclust)) Calcular as distâncias entre os cortes para determinar onde ocorre o maior salto Alturas dos agrupamentos > alt1=hclust1$height > length(alt1) [1] 39 > alt1[-length(alt1)] [1] 0.0193499 0.2031740 0.2612237 0.3579732 0.4861878 0.4861878 0.4934225 0.5725396 0.5804970 0.6362524 0.6585035 [12] 0.6669589 0.6677304 0.6968946 0.7639477 0.8019294 1.0707859 1.0970810 1.1033747 1.2136399 1.2184315 1.2703767 [23] 1.4208837 1.5069458 1.5344150 1.7010156 2.0456098 2.1618541 2.3562702 2.54952153.3579000 3.5815019 3.8620021 [34] 4.3477895 7.5456519 8.8562412 9.6626468 13.0525720 Deslocar os saltos de uma casa começando pelo 0 > alt1.lag=c(0,alt1[-length(alt1)]) Calcular os saltos entre fusões consecutivas > salto1=alt1-alt1.lag > max(salto1) [1] 3.389925 > which.max(salto1) [1] 38 Dar a ordem das fusões > hclust1$merge [,1] [,2] [1,] -9 -29 [2,] -19 -31 [3,] -3 -25 [4,] -7 -38 [5,] -16 -39 [6,] -20 -33 [7,] -23 -24 [8,] -18 -30 [9,] -14 -32 [10,] -21 1 [11,] -12 2 [12,] -22 -26 [13,] -1 -13 [14,] -28 6 [15,] 4 7 [16,] -11 -15 [17,] -6 -27 [18,] -35 13 [19,] -17 -36 [20,] 8 17 [21,] -40 9 [22,] -4 -5 [23,] 10 11 [24,] -10 -34 [25,] 14 20 [26,] -2 -37 [27,] 3 5 [28,] -8 22 [29,] 19 24 [30,] 21 25 [31,] 26 28 [32,] 15 27 [33,] 12 18 [34,] 16 31 [35,] 23 32 [36,] 33 34 [37,] 30 36 [38,] 29 37 [39,] 35 38 Descrição dos clusters Comparar os clusters quanto às drivers numéricas > aggregate(gnum[-1],by=list(gclust),median) Group.1 DESPESAS MAX_INT ITENS FREQUÊNCIA IDADE DEPENDENTES 1 1 280.5 12 3.0 3.5 44 1 2 2 84.0 10 2.5 1.0 30 1 3 3 187.0 29 5.0 1.0 49 0 > boxplot(gnum$DESPESAS~gclust,main="DESPESAS") > boxplot(gnum$MAX_INT~gclust,main="MAX_INT") > boxplot(gnum$FREQUÊNCIA~gclust,main="FREQUÊNCIA") > boxplot(gnum$ITENS~gclust,main="ITENS") Discriminação – Análise das “não drivers” > prop.table(table(GRAMPERS$REGIAO,gclust),1) gclust 1 2 3 AA 0.84615385 0.15384615 0.00000000 BB 0.57142857 0.35714286 0.07142857 CC 0.23076923 0.53846154 0.23076923 > prop.table(table(GRAMPERS$REGIAO,gclust),2) gclust 1 2 3 AA 0.5000000 0.1428571 0.0000000 BB 0.3636364 0.3571429 0.2500000 CC 0.1363636 0.5000000 0.7500000 > boxplot(GRAMPERS$IDADE~gclust,main="IDADE",xlab="cluster")
Compartilhar