TAGS :Viewed: 2 - Published at: a few seconds ago

[ How to make a table shows the change time by time in r ]

My data looks like

ID  Joint_time  leave_time  group
1   201501      201603      2
2   201508      201601      2
3   201503      201601      2
4   201512      201601      3
5   201511      201602      2
6   201503      .           1
7   201503      .           1
8   201506      201602      3
9   201507      .           1
10  201503      .           1
11  201601      201602      2
12  201601      .           1
13  201601      201603      2
14  201601      201602      3
15  201601      201602      3
16  201602      .           1
17  201602      .           1
18  201602      201603      3
19  201602      .           1
20  201602      .           1
21  201602      .           1
22  201603      .           1
23  201603      .           1
24  201603      .           1
25  201603      .           1
26  201603      .           1
27  201603      .           1
28  201603      .           1

I want to know the change and total customer number in the end of each month. I want to demonstrate the leaving and joining customer number. I only know to use table(). But this code does not seem to deal with this kind of complex table. My data is as following

 ID<-c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28)
Joint_time<-c("201501","201508","201503","201512","201511","201503","201503","201506","201507","201503","201601","201601","201601","201601","201601","201602","201602","201602","201602","201602","201602","201603","201603","201603","201603","201603","201603","201603")
leave_time<-c("201603","201601","201601","201601","201602",".",".","201602",".",".","201602",".","201603","201602","201602",".",".","201603",".",".",".",".",".",".",".",".",".",".")
group<-c(2,2,2,3,2,1,1,3,1,1,2,1,2,3,3,1,1,3,1,1,1,1,1,1,1,1,1,1)
question_table<-data.frame(ID,Joint_time,leave_time,group)

I would like to build a table as following

                                               201601   201602  201603
Total number in month beginning                    10       12      13
Joint this month                                    5        6       7
Group 2 who joint during 2015 leave this month      2        1       1
Group 2 who joint during 2016 leave this month      0        1       1
Group 3 who joint during 2015 leave this month      1        1       0
Group 3 who joint during 2016 leave this month      0        2       1
Total number in month end                          12       13      17

Thanks in advance.

Answer 1


I am going to help with each part of the needed output as I don't believe it is a good idea to put all that data in that format in a single data frame. If you definitely require it in that format, I may edit the answer.

To compute the number of people from different groups and join years leaving, you can using a combination of dplyr and tidyr packages as follows:

library(dplyr)
library(tidyr)
question_table %>%
  filter(leave_time != '.') %>%
  mutate(Joint_year = substr(Joint_time, 1, 4)) %>%
  group_by(group, leave_time, Joint_year) %>%
  summarise(left = n()) %>%
  spread(leave_time, left, fill = 0)

Returned output is as follows:

Source: local data frame [4 x 5]
Groups: group [2]

  group Joint_year 201601 201602 201603
  (dbl)      (chr)  (dbl)  (dbl)  (dbl)
1     2       2015      2      1      1
2     2       2016      0      1      1
3     3       2015      1      1      0
4     3       2016      0      2      1

To summarise, how many people joined in each of those 2016 months, you can do something like this:

question_table %>%
  filter(Joint_time %in% c('201601', '201602', '201603')) %>%
  group_by(Joint_time) %>%
  summarise(joined = n()) %>%
  spread(Joint_time, joined, fill = 0)

Source: local data frame [1 x 3]

  201601 201602 201603
   (dbl)  (dbl)  (dbl)
1      5      6      7

In this case, you are better off avoiding the spread at the end and retaining the data in long format. But, that is up to you.

As for the last part of getting total customers at the beginning of each period, you can do something like this:

question_table$Joint_time <- as.character(question_table$Joint_time)
question_table$leave_time <- as.character(question_table$leave_time)

df <- data.frame(numberBeginning = sapply(sort(unique(question_table$leave_time[question_table$leave_time != '.'])), function(x) nrow(filter(question_table, Joint_time < x, leave_time == '.' | leave_time >= x))))

If you want to get the last one in wide format, it needs some more work:

df$period <- row.names(df)
row.names(df) <- NULL
df <- spread(df, period, numberBeginning)

  201601 201602 201603
1     10     12     13

You can modify the above code slightly to get the last bit of information on numbers ending as follows:

df <- data.frame(numberEnding = sapply(sort(unique(question_table$leave_time[question_table$leave_time != '.'])), function(x) nrow(filter(question_table, Joint_time <= x, leave_time == '.' | leave_time > x))))
df$period <- row.names(df)
row.names(df) <- NULL
df <- spread(df, period, numberEnding)
df
  201601 201602 201603
1     12     13     17