[R] Procedure not working for actual data

jim holtman jholtman at gmail.com
Thu Feb 18 19:17:13 CET 2010


Even though it may work for a small subset, it can still break on
larger sets.  Your code was doing a number of 'unlist' and tearing
apart the data and it is possible that some of the transformations
were not aligned with the data in the way you thought them to be.
What you need to do in that case is break down what is happening and
look at the data in each substep to make sure it is what you are
expecting.

On Thu, Feb 18, 2010 at 11:50 AM, ROLL Josh F <JRoll at lcog.org> wrote:
> Hey Jim,
>   That appears to work properly with my larger data set.  That's really
> strange to me though, why would my procedure not work even though the test
> works correctly?  I have always coded under the assumption that the code
> doesn't do anything the user doesn't tell it too but I cant see a problem
> with my code.
>
> I am looking at a similar problem with another piece of the code right now
> where everything looks right but it just isn't giving me the right output,
> although I haven't constructed a test yet.
>
> Thanks for the help.
>
> JR
>
> ________________________________
> From: jim holtman [mailto:jholtman at gmail.com]
> Sent: Wednesday, February 17, 2010 5:09 PM
> To: ROLL Josh F
> Cc: r-help at r-project.org
> Subject: Re: [R] Procedure not working for actual data
>
> Try this on your real data:
>
>> #Sample data
>> Bldgid<-c(1000,1000,1001,1002,1003,1003)
>> Maplot<-c(20000,20001,30000,30001,40000,40001)
>> Area<-c(40,170,50,100,100,4.9)
>> #Construct Sample dataframe
>> MultiLotBldgs..<-data.frame(Bldgid,Maplot,Area)
>> #Get Building Area Proportions
>> MultiLotBldgs..$Prop <- ave(MultiLotBldgs..$Area, MultiLotBldgs..$Bldgid,
> +     FUN=function(x) x / sum(x))
>>
>> # find not too small
>> notTooSmall <- !((MultiLotBldgs..$Area <= 45) | ((MultiLotBldgs..$Area >
>> 45) &
> +     (MultiLotBldgs..$Prop < 0.05)))
>>
>> MultiLotBldgs2.. <- MultiLotBldgs..[notTooSmall,]
>> # print out results
>> MultiLotBldgs2..
>   Bldgid Maplot Area      Prop
> 2   1000  20001  170 0.8095238
> 3   1001  30000   50 1.0000000
> 4   1002  30001  100 1.0000000
> 5   1003  40000  100 0.9532888
>>
>>
>>
>
>
> On Wed, Feb 17, 2010 at 6:58 PM, ROLL Josh F <JRoll at lcog.org> wrote:
>>
>> Sorry Just a generic list
>>
>> Is<-list()
>>
>> forgot to add that from my actual code
>> ________________________________
>> From: jim holtman [mailto:jholtman at gmail.com]
>> Sent: Wednesday, February 17, 2010 3:58 PM
>> To: ROLL Josh F
>> Cc: r-help at r-project.org
>> Subject: Re: [R] Procedure not working for actual data
>>
>> Your example does not work since "Is" is not defined.  What is it supposed
>> to be?
>>
>> On Wed, Feb 17, 2010 at 6:34 PM, LCOG1 <jroll at lcog.org> wrote:
>>>
>>> Hello all,
>>>   I have what i feel is a unique situation which may not be resolved with
>>> this inquiry.  I have constructed the below data set so that i may give
>>> an
>>> example of what im doing.  The example works perfectly and i have no
>>> issues
>>> with it.  My problem arises with my actual data, which includes another
>>> 11
>>> columns of data (used in later analysis) and a total of about 7000
>>> cases(rows).  i mention the dimensions of the actual data because im
>>> wondering if my below process would encounter problems with more data.
>>>  To be sure the problem occurs in the last step.  Is$NotTooSmall gives me
>>> a
>>> binary output that is then put back in MultiLotBldgs.. (as shown in the
>>> example) to return the cases i want to keep.
>>>  In my actual data the binary designation is correct but when
>>> MultiLotBldgs2.. returns it doesnt remove the cases that are False in
>>> Is$NotTooSmall.  Like i said my sample data works fine but my actual
>>> implementation does not.  Any suggestions?  I know this is not easy to
>>> answer without seeing the problem but this is the best i can do without
>>> sending you all of my data.
>>>
>>> Cheers,
>>> JR
>>>
>>>
>>>
>>>
>>> #Sample data
>>> Bldgid<-c(1000,1000,1001,1002,1003,1003)
>>> Maplot<-c(20000,20001,30000,30001,40000,40001)
>>> Area<-c(40,170,50,100,100,4.9)
>>> #Construct Sample dataframe
>>> MultiLotBldgs..<-data.frame(Bldgid,Maplot,Area)
>>> #Get Building Areas
>>> MultiLotBldgArea.X <- unlist(tapply(MultiLotBldgs..$Area,
>>> MultiLotBldgs..$Bldgid,
>>>                              function(x) x))
>>>
>>> # Calculate the proportion of the total building area in each piece of
>>> the
>>> building
>>> MultiLotBldgProp.X <- unlist(tapply(MultiLotBldgs..$Area,
>>> MultiLotBldgs..$Bldgid,
>>>                              function(x) x/sum(x)))
>>>
>>> #Identify buildings that should be considered for joining
>>> Is$NotTooSmall.X <- !(((MultiLotBldgArea.X <= 45) |
>>>                            ((MultiLotBldgArea.X > 45) &
>>> (MultiLotBldgProp.X
>>> < 0.05))))
>>>
>>> MultiLotBldgs2.. <- MultiLotBldgs..[Is$NotTooSmall.X, ]
>>>
>>> --
>>> View this message in context:
>>> http://n4.nabble.com/Procedure-not-working-for-actual-data-tp1559492p1559492.html
>>> Sent from the R help mailing list archive at Nabble.com.
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>> --
>> Jim Holtman
>> Cincinnati, OH
>> +1 513 646 9390
>>
>> What is the problem that you are trying to solve?
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem that you are trying to solve?
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?



More information about the R-help mailing list