R version 4.1.0 (2021-05-18) -- "Camp Pontanezen" Copyright (C) 2021 The R Foundation for Statistical Computing Platform: x86_64-w64-mingw32/x64 (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. [Workspace loaded from ~/.RData] > library(tidyverse) -- Attaching packages ------------------------------------------------------------------------------ tidyverse 1.3.1 -- ggplot2 3.3.5 purrr 0.3.4 tibble 3.1.4 dplyr 1.0.7 tidyr 1.1.3 stringr 1.4.0 readr 2.0.1 forcats 0.5.1 -- Conflicts --------------------------------------------------------------------------------- tidyverse_conflicts() -- x dplyr::filter() masks stats::filter() x dplyr::lag() masks stats::lag() Warning messages: 1: package ‘tidyverse’ was built under R version 4.1.1 2: package ‘tibble’ was built under R version 4.1.1 3: package ‘tidyr’ was built under R version 4.1.1 4: package ‘readr’ was built under R version 4.1.1 5: package ‘purrr’ was built under R version 4.1.1 6: package ‘stringr’ was built under R version 4.1.1 7: package ‘forcats’ was built under R version 4.1.1 > andmed=read_csv("http://www.tlu.ee/~jaagup/andmed/muu/estonia_reisijad.txt") Rows: 989 Columns: 8 0s-- Column specification ----------------------------------------------------------------------------------------------- Delimiter: "," chr (5): Country, Firstname, Lastname, Sex, Category dbl (3): PassengerId, Age, Survived i Use `spec()` to retrieve the full column specification for this data. i Specify the column types or set `show_col_types = FALSE` to quiet this message. > nrow(andmed) [1] 989 > andmed %>% nrow() [1] 989 > andmed %>% n() Error in n(.) : unused argument (.) > length(andmed$PassengerId) [1] 989 > andmed %>% nrow() [1] 989 > andmed[andmed$Category=="P"] %>% nrow() Error: Must subset columns with a valid subscript vector. i Logical subscripts must match the size of the indexed input. x Input has size 8 but subscript `andmed$Category == "P"` has size 989. Run `rlang::last_error()` to see where the error occurred. > andmed[andmed$Category=="P"] %>% length() Error: Must subset columns with a valid subscript vector. i Logical subscripts must match the size of the indexed input. x Input has size 8 but subscript `andmed$Category == "P"` has size 989. Run `rlang::last_error()` to see where the error occurred. > andmed[andmed$Category=="P"] Error: Must subset columns with a valid subscript vector. i Logical subscripts must match the size of the indexed input. x Input has size 8 but subscript `andmed$Category == "P"` has size 989. Run `rlang::last_error()` to see where the error occurred. > andmed$Category=="P" [1] TRUE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE FALSE TRUE FALSE FALSE TRUE TRUE TRUE [20] TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [39] FALSE FALSE TRUE FALSE TRUE TRUE TRUE FALSE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE [58] TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE FALSE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE [77] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE FALSE TRUE TRUE TRUE TRUE TRUE [96] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [115] TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [134] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [153] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE [172] TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE [191] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [210] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [229] TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE FALSE TRUE FALSE TRUE TRUE [248] TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [267] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE [286] FALSE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [305] TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE [324] TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE FALSE TRUE [343] TRUE TRUE TRUE FALSE TRUE TRUE TRUE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [362] TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE FALSE [381] FALSE TRUE TRUE FALSE TRUE FALSE TRUE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE [400] TRUE TRUE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE [419] FALSE TRUE TRUE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE TRUE TRUE [438] TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [457] TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE TRUE FALSE TRUE [476] TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE [495] TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE [514] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE FALSE [533] FALSE TRUE TRUE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE FALSE [552] TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE TRUE FALSE TRUE TRUE TRUE FALSE FALSE TRUE TRUE FALSE FALSE [571] FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE TRUE TRUE TRUE FALSE TRUE [590] FALSE FALSE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE [609] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [628] FALSE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE FALSE TRUE [647] FALSE FALSE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [666] TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE TRUE FALSE FALSE FALSE [685] FALSE FALSE TRUE TRUE TRUE FALSE TRUE FALSE FALSE TRUE FALSE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE [704] TRUE FALSE TRUE FALSE TRUE TRUE TRUE TRUE FALSE TRUE FALSE FALSE FALSE TRUE TRUE FALSE TRUE TRUE FALSE [723] TRUE FALSE FALSE FALSE TRUE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE TRUE TRUE TRUE [742] TRUE TRUE TRUE TRUE TRUE FALSE TRUE FALSE TRUE TRUE TRUE FALSE FALSE FALSE TRUE TRUE FALSE TRUE FALSE [761] TRUE TRUE TRUE FALSE TRUE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE [780] TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE TRUE TRUE TRUE [799] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [818] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [837] TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE [856] FALSE TRUE TRUE FALSE FALSE FALSE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE [875] TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE FALSE TRUE FALSE TRUE TRUE FALSE TRUE TRUE TRUE TRUE FALSE [894] FALSE FALSE TRUE FALSE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE FALSE FALSE [913] FALSE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE FALSE FALSE TRUE TRUE FALSE TRUE FALSE FALSE TRUE [932] TRUE FALSE TRUE TRUE FALSE FALSE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE FALSE TRUE TRUE [951] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [970] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [989] TRUE > sum(andmed$Category=="P") [1] 796 > andmed[andmed$Category=="P"] Error: Must subset columns with a valid subscript vector. i Logical subscripts must match the size of the indexed input. x Input has size 8 but subscript `andmed$Category == "P"` has size 989. Run `rlang::last_error()` to see where the error occurred. > andmed[andmed$Category=="P", ] # A tibble: 796 x 8 PassengerId Country Firstname Lastname Sex Age Category Survived 1 1 Sweden ARVID KALLE AADLI M 62 P 0 2 5 Sweden BRITTA ELISABET AHLSTROM F 55 P 0 3 6 Sweden GERD INGA MAGNHILD AHLSTROM F 71 P 0 4 7 Sweden HJALMAR AHLSTROM M 60 P 0 5 8 Estonia PILLE AHMAN F 18 P 0 6 10 Sweden ANNA MARIA ALDRIN F 63 P 0 7 11 Sweden LARS BERTIL ALDRIN M 67 P 0 8 12 Estonia NELLI ALEKSEEVA F 61 P 0 9 14 Sweden TAMARA ALEP F 68 P 0 10 17 Estonia ARMIDO ALLAS M 31 P 0 # ... with 786 more rows > andmed[andmed$Category=="P", ] %>% nrow() [1] 796 > andmed %>% filter(Category=="P") %>% nrow() [1] 796 > andmed %>% filter(Category=="P", Country=="Estonia") %>% nrow() [1] 175 > andmed %>% filter(Country=="Estonia") %>% nrow() [1] 344 > andmed %>% filter(Country=="Estonia") %>% nrow() / andmed %>% nrow() [1] 0.3478261 > andmed %>% filter(Country=="Estonia") %>% nrow() / andmed %>% nrow() * 100 [1] 34.78261 > andmed %>% filter(Country=="Estonia") %>% nrow() / andmed %>% nrow() * 100 %>% round(2) [1] 34.78261 > andmed %>% filter(Country=="Estonia") %>% nrow() / andmed %>% nrow() * 100 %>% round() [1] 34.78261 > round(andmed %>% filter(Country=="Estonia") %>% nrow() / andmed %>% nrow() * 100, 2) [1] 34.78 > pandmed=andmed %>% filter(Category=="P") > round(pandmed %>% filter(Country=="Estonia") %>% nrow() / pandmed %>% nrow() * 100, 2) [1] 21.98 > round(pandmed %>% filter(Country=="Estonia") %>% nrow() / pandmed %>% nrow() * 100, 2) [1] 21.98 > candmed=andmed %>% filter(Category=="C") > round(candmed %>% filter(Country=="Estonia") %>% nrow() / candmed %>% nrow() * 100, 2) [1] 87.56 round(pandmed %>% filter(Country=="Estonia") %>% nrow() / pandmed %>% nrow() * 100, 2) andmed %>% filter(Country=="Estonia", Category=="P") %>% group_by(Survived) %>% summarise(kogus=n()) %>% ungroup() %>% mutate(yldkogus=sum(kogus), protsent=kogus*100/sum(kogus)) andmed %>% summarise(suurim=max(Age), vahim=min(Age), arkesk=mean(Age), med=median(Age)) andmed %>% filter(Category=="C", Country=="Estonia") %>% summarise(suurim=max(Age), vahim=min(Age), arkesk=mean(Age), med=median(Age)) hist(andmed$Age) andmed %>% filter(Country=="Estonia", Category=="C") %>% .$Age %>% hist() andmed %>% filter(Country=="Estonia", Category=="P") %>% .$Age %>% boxplot() andmed %>% filter(Country %in% c("Sweden", "Finland"), Category=="P") %>% .$Age %>% boxplot() andmed %>% filter(Country=="Sweden" | Country=="Finland", Category=="P") %>% .$Age %>% boxplot() andmed %>% filter(Country=="Sweden") %>% .$Age %>% boxplot() andmed %>% group_by(Country) %>% summarise(kogus=n()) %>% arrange(-kogus) andmed %>% group_by(Country) %>% summarise(kogus=n()) %>% arrange(-kogus) %>% filter(kogus>1) andmed %>% group_by(Country) %>% summarise(kogus=n()) %>% arrange(-kogus) andmed %>% group_by(Country) %>% summarise(kogus=n()) %>% arrange(-kogus) %>% filter(kogus>(andmed %>% nrow() * 0.01)) andmed %>% group_by(Country) %>% summarise(keskmine=mean(Age)) andmed %>% group_by(Country) %>% summarise(keskmine=mean(Age), mediaan=median(Age)) %>% arrange(keskmine) andmed %>% mutate(erinevus=Age-mean(Age)) andmed %>% group_by(Country) %>% mutate(erinevus=Age-mean(Age)) andmed %>% group_by(Sex) %>% mutate(erinevus=Age-mean(Age)) andmed %>% group_by(Sex) %>% summarise(kesk=mean(Age)) andmed %>% mutate(kymnend=as.integer(Age/10)*10) andmed %>% mutate(kymnend=as.integer(Age/10)*10) %>% group_by(kymnend) %>% summarise(kogus=n()) andmed %>% filter(Survived==1) %>% mutate(kymnend=as.integer(Age/10)*10) %>% group_by(kymnend) %>% summarise(kogus=n()) andmed %>% mutate(kymnend=as.integer(Age/10)*10) %>% group_by(kymnend) %>% summarise(kogus=n(), elus=sum(Survived==1), osakaal=elus/kogus) andmed %>% mutate(kymnend=as.integer(Age/10)*10) %>% filter(Sex=="F") %>% group_by(kymnend) %>% summarise(kogus=n(), elus=sum(Survived==1), osakaal=elus/kogus) andmed %>% mutate(kymnend=as.integer(Age/10)*10) %>% filter(Sex=="F") %>% group_by(kymnend) %>% summarise(kogus=n(), elus=sum(Survived==1), osakaal=elus/kogus) prop.test(2, 22) prop.test(11, 75) andmed %>% mutate(kymnend=as.integer(Age/10)*10) %>% group_by(kymnend, Sex) %>% summarise(kogus=n(), elus=sum(Survived==1), osakaal=elus/kogus) #proovige sama päring maade järgi #Jätke alles vaid Eesti, Soome ja Rootsi andmed %>% mutate(kymnend=as.integer(Age/10)*10) %>% group_by(kymnend, Country) %>% summarise(kogus=n(), elus=sum(Survived==1), osakaal=elus/kogus) andmed %>% mutate(kymnend=as.integer(Age/10)*10) %>% group_by(kymnend, Country) %>% summarise(kogus=n(), elus=sum(Survived==1), osakaal=elus/kogus) %>% select(kymnend, Country, osakaal) %>% spread(Country, osakaal) andmed %>% group_by(Lastname) %>% summarise(h=sum(Survived==0), e=sum(Survived==1)) %>% filter(h>0, e>0) %>% .$Lastname -> pnimed pnimed andmed %>% filter(Lastname %in% pnimed)