[ How to make a table shows the change time by time in r ]
My data looks like
ID Joint_time leave_time group
1 201501 201603 2
2 201508 201601 2
3 201503 201601 2
4 201512 201601 3
5 201511 201602 2
6 201503 . 1
7 201503 . 1
8 201506 201602 3
9 201507 . 1
10 201503 . 1
11 201601 201602 2
12 201601 . 1
13 201601 201603 2
14 201601 201602 3
15 201601 201602 3
16 201602 . 1
17 201602 . 1
18 201602 201603 3
19 201602 . 1
20 201602 . 1
21 201602 . 1
22 201603 . 1
23 201603 . 1
24 201603 . 1
25 201603 . 1
26 201603 . 1
27 201603 . 1
28 201603 . 1
I want to know the change and total customer number in the end of each month. I want to demonstrate the leaving and joining customer number. I only know to use table(). But this code does not seem to deal with this kind of complex table. My data is as following
ID<-c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28)
Joint_time<-c("201501","201508","201503","201512","201511","201503","201503","201506","201507","201503","201601","201601","201601","201601","201601","201602","201602","201602","201602","201602","201602","201603","201603","201603","201603","201603","201603","201603")
leave_time<-c("201603","201601","201601","201601","201602",".",".","201602",".",".","201602",".","201603","201602","201602",".",".","201603",".",".",".",".",".",".",".",".",".",".")
group<-c(2,2,2,3,2,1,1,3,1,1,2,1,2,3,3,1,1,3,1,1,1,1,1,1,1,1,1,1)
question_table<-data.frame(ID,Joint_time,leave_time,group)
I would like to build a table as following
201601 201602 201603
Total number in month beginning 10 12 13
Joint this month 5 6 7
Group 2 who joint during 2015 leave this month 2 1 1
Group 2 who joint during 2016 leave this month 0 1 1
Group 3 who joint during 2015 leave this month 1 1 0
Group 3 who joint during 2016 leave this month 0 2 1
Total number in month end 12 13 17
Thanks in advance.
Answer 1
I am going to help with each part of the needed output as I don't believe it is a good idea to put all that data in that format in a single data frame. If you definitely require it in that format, I may edit the answer.
To compute the number of people from different groups and join years leaving, you can using a combination of dplyr
and tidyr
packages as follows:
library(dplyr)
library(tidyr)
question_table %>%
filter(leave_time != '.') %>%
mutate(Joint_year = substr(Joint_time, 1, 4)) %>%
group_by(group, leave_time, Joint_year) %>%
summarise(left = n()) %>%
spread(leave_time, left, fill = 0)
Returned output is as follows:
Source: local data frame [4 x 5]
Groups: group [2]
group Joint_year 201601 201602 201603
(dbl) (chr) (dbl) (dbl) (dbl)
1 2 2015 2 1 1
2 2 2016 0 1 1
3 3 2015 1 1 0
4 3 2016 0 2 1
To summarise, how many people joined in each of those 2016 months, you can do something like this:
question_table %>%
filter(Joint_time %in% c('201601', '201602', '201603')) %>%
group_by(Joint_time) %>%
summarise(joined = n()) %>%
spread(Joint_time, joined, fill = 0)
Source: local data frame [1 x 3]
201601 201602 201603
(dbl) (dbl) (dbl)
1 5 6 7
In this case, you are better off avoiding the spread
at the end and retaining the data in long format. But, that is up to you.
As for the last part of getting total customers at the beginning of each period, you can do something like this:
question_table$Joint_time <- as.character(question_table$Joint_time)
question_table$leave_time <- as.character(question_table$leave_time)
df <- data.frame(numberBeginning = sapply(sort(unique(question_table$leave_time[question_table$leave_time != '.'])), function(x) nrow(filter(question_table, Joint_time < x, leave_time == '.' | leave_time >= x))))
If you want to get the last one in wide format, it needs some more work:
df$period <- row.names(df)
row.names(df) <- NULL
df <- spread(df, period, numberBeginning)
201601 201602 201603
1 10 12 13
You can modify the above code slightly to get the last bit of information on numbers ending as follows:
df <- data.frame(numberEnding = sapply(sort(unique(question_table$leave_time[question_table$leave_time != '.'])), function(x) nrow(filter(question_table, Joint_time <= x, leave_time == '.' | leave_time > x))))
df$period <- row.names(df)
row.names(df) <- NULL
df <- spread(df, period, numberEnding)
df
201601 201602 201603
1 12 13 17