[R] Procedure not working for actual data

Bert Gunter gunter.berton at gene.com
Thu Feb 18 19:38:01 CET 2010


?traceback may be useful.

Bert Gunter
Genentech Nonclinical Biostatistics
 
 

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of jim holtman
Sent: Thursday, February 18, 2010 10:17 AM
To: ROLL Josh F
Cc: r-help at r-project.org
Subject: Re: [R] Procedure not working for actual data

Even though it may work for a small subset, it can still break on
larger sets.  Your code was doing a number of 'unlist' and tearing
apart the data and it is possible that some of the transformations
were not aligned with the data in the way you thought them to be.
What you need to do in that case is break down what is happening and
look at the data in each substep to make sure it is what you are
expecting.

On Thu, Feb 18, 2010 at 11:50 AM, ROLL Josh F <JRoll at lcog.org> wrote:
> Hey Jim,
>   That appears to work properly with my larger data set.  That's really
> strange to me though, why would my procedure not work even though the test
> works correctly?  I have always coded under the assumption that the code
> doesn't do anything the user doesn't tell it too but I cant see a problem
> with my code.
>
> I am looking at a similar problem with another piece of the code right now
> where everything looks right but it just isn't giving me the right output,
> although I haven't constructed a test yet.
>
> Thanks for the help.
>
> JR
>
> ________________________________
> From: jim holtman [mailto:jholtman at gmail.com]
> Sent: Wednesday, February 17, 2010 5:09 PM
> To: ROLL Josh F
> Cc: r-help at r-project.org
> Subject: Re: [R] Procedure not working for actual data
>
> Try this on your real data:
>
>> #Sample data
>> Bldgid<-c(1000,1000,1001,1002,1003,1003)
>> Maplot<-c(20000,20001,30000,30001,40000,40001)
>> Area<-c(40,170,50,100,100,4.9)
>> #Construct Sample dataframe
>> MultiLotBldgs..<-data.frame(Bldgid,Maplot,Area)
>> #Get Building Area Proportions
>> MultiLotBldgs..$Prop <- ave(MultiLotBldgs..$Area, MultiLotBldgs..$Bldgid,
> +     FUN=function(x) x / sum(x))
>>
>> # find not too small
>> notTooSmall <- !((MultiLotBldgs..$Area <= 45) | ((MultiLotBldgs..$Area >
>> 45) &
> +     (MultiLotBldgs..$Prop < 0.05)))
>>
>> MultiLotBldgs2.. <- MultiLotBldgs..[notTooSmall,]
>> # print out results
>> MultiLotBldgs2..
>   Bldgid Maplot Area      Prop
> 2   1000  20001  170 0.8095238
> 3   1001  30000   50 1.0000000
> 4   1002  30001  100 1.0000000
> 5   1003  40000  100 0.9532888
>>
>>
>>
>
>
> On Wed, Feb 17, 2010 at 6:58 PM, ROLL Josh F <JRoll at lcog.org> wrote:
>>
>> Sorry Just a generic list
>>
>> Is<-list()
>>
>> forgot to add that from my actual code
>> ________________________________
>> From: jim holtman [mailto:jholtman at gmail.com]
>> Sent: Wednesday, February 17, 2010 3:58 PM
>> To: ROLL Josh F
>> Cc: r-help at r-project.org
>> Subject: Re: [R] Procedure not working for actual data
>>
>> Your example does not work since "Is" is not defined.  What is it
supposed
>> to be?
>>
>> On Wed, Feb 17, 2010 at 6:34 PM, LCOG1 <jroll at lcog.org> wrote:
>>>
>>> Hello all,
>>>   I have what i feel is a unique situation which may not be resolved
with
>>> this inquiry.  I have constructed the below data set so that i may give
>>> an
>>> example of what im doing.  The example works perfectly and i have no
>>> issues
>>> with it.  My problem arises with my actual data, which includes another
>>> 11
>>> columns of data (used in later analysis) and a total of about 7000
>>> cases(rows).  i mention the dimensions of the actual data because im
>>> wondering if my below process would encounter problems with more data.
>>>  To be sure the problem occurs in the last step.  Is$NotTooSmall gives
me
>>> a
>>> binary output that is then put back in MultiLotBldgs.. (as shown in the
>>> example) to return the cases i want to keep.
>>>  In my actual data the binary designation is correct but when
>>> MultiLotBldgs2.. returns it doesnt remove the cases that are False in
>>> Is$NotTooSmall.  Like i said my sample data works fine but my actual
>>> implementation does not.  Any suggestions?  I know this is not easy to
>>> answer without seeing the problem but this is the best i can do without
>>> sending you all of my data.
>>>
>>> Cheers,
>>> JR
>>>
>>>
>>>
>>>
>>> #Sample data
>>> Bldgid<-c(1000,1000,1001,1002,1003,1003)
>>> Maplot<-c(20000,20001,30000,30001,40000,40001)
>>> Area<-c(40,170,50,100,100,4.9)
>>> #Construct Sample dataframe
>>> MultiLotBldgs..<-data.frame(Bldgid,Maplot,Area)
>>> #Get Building Areas
>>> MultiLotBldgArea.X <- unlist(tapply(MultiLotBldgs..$Area,
>>> MultiLotBldgs..$Bldgid,
>>>                              function(x) x))
>>>
>>> # Calculate the proportion of the total building area in each piece of
>>> the
>>> building
>>> MultiLotBldgProp.X <- unlist(tapply(MultiLotBldgs..$Area,
>>> MultiLotBldgs..$Bldgid,
>>>                              function(x) x/sum(x)))
>>>
>>> #Identify buildings that should be considered for joining
>>> Is$NotTooSmall.X <- !(((MultiLotBldgArea.X <= 45) |
>>>                            ((MultiLotBldgArea.X > 45) &
>>> (MultiLotBldgProp.X
>>> < 0.05))))
>>>
>>> MultiLotBldgs2.. <- MultiLotBldgs..[Is$NotTooSmall.X, ]
>>>
>>> --
>>> View this message in context:
>>>
http://n4.nabble.com/Procedure-not-working-for-actual-data-tp1559492p1559492
.html
>>> Sent from the R help mailing list archive at Nabble.com.
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>> --
>> Jim Holtman
>> Cincinnati, OH
>> +1 513 646 9390
>>
>> What is the problem that you are trying to solve?
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem that you are trying to solve?
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list