r - Creating Repeated Start and End Dates -
i have data set many variables. of interest are: id, episode, start, end, assessment date. example data set shown
id episode start end assessmentdate 1 1 1/1/2012 12/21/2012 1/1/2012 1 1 1/1/2010 12/21/2012 12/12/2012 1 1 1/1/2010 12/21/2012 12/21/2012 1 2 1/1/2013 . 1/2/2013 1 2 1/1/2013 . 2/2/2013 1 2 1/1/2013 . 3/2/2013 2 1 1/1/2012 . 4/1/2012 2 1 1/1/2010 . 5/12/2012 2 1 1/1/2010 . 6/21/2012 2 2 1/1/2013 . 7/2/2013 2 2 1/1/2013 . 8/2/2013 2 2 1/1/2013 . 9/2/2013
i have start dates everyone, not end dates. want identify end date each episode , each patient, 10,000 patients. want end date last date of assessment per episode number, , want present each row between first , last assessment dates.
i reading bit splitting data set many smaller parts based on id , episode, feel there should simpler way this. i'm new r, coming sas, , issue in sas not give me trouble.
i appreciate input may have regarding data preparations.
you can find maximum assessment date episode using ddply()
plyr
library:
df <- data.frame(id=1, episode=c(1,1,1,2,2,2), assessmentdate=as.date(c("2012-01-01", "2012-12-12", "2012-12-21", "2013-01-02", "2013-02-02", "2013-03-02"))) library(plyr) df <- ddply(df, .(episode), transform, end=max(assessmentdate)) df
which gives you:
id episode assessmentdate end 1 1 1 2012-01-01 2012-12-21 2 1 1 2012-12-12 2012-12-21 3 1 1 2012-12-21 2012-12-21 4 1 2 2013-01-02 2013-03-02 5 1 2 2013-02-02 2013-03-02 6 1 2 2013-03-02 2013-03-02
if want patient, can use ddply()
.(id)
(assuming identifies patients) or that.
it's possible by()
, becomes bit more complicated because split data lists identified values of grouping variable.
edit: also, if episode
not unique on entire data frame, i.e. repeats each patient, group both variables, i.e. ddply(df, .(id, episode), ...)
.
Comments
Post a Comment