[R-sig-eco] Subset dataframe

Jade Maggs jmaggs at ori.org.za
Wed Apr 17 07:54:49 CEST 2013


Hi Thierry, here is the output from str(cpueData). This is the whole dataset, not just a sample.

> str(cpueData)
'data.frame':	679848 obs. of  14 variables:
 $ patrol_ID     : int  51674 51674 51674 51674 51675 51675 51675 51675 51676 51676 ...
 $ start_datetime: Factor w/ 90082 levels "1996/1/1 06:30:00",..: 1 1 1 1 1 1 1 1 2 2 ...
 $ year          : int  1996 1996 1996 1996 1996 1996 1996 1996 1996 1996 ...
 $ start_locality: num  3708 3708 3708 3708 3920 ...
 $ end_locality  : num  3708 3708 3708 3708 3910 ...
 $ end_datetime  : Factor w/ 93805 levels "1996/1/1 09:00:00",..: 5 5 5 5 1 1 1 1 3 3 ...
 $ patrolHrs     : num  5.25 5.25 5.25 5.25 2.49 2.49 2.49 2.49 4 4 ...
 $ zone          : Factor w/ 15 levels "BN","BT","CV",..: 3 3 3 3 2 2 2 2 3 3 ...
 $ dist_patrol   : num  0.5 0.5 0.5 0.5 10 10 10 10 3 3 ...
 $ outing_ID     : int  51609 51610 51611 51612 51613 51614 51615 51616 51617 51618 ...
 $ num_anglers   : num  3 1 4 1 2 3 8 1 2 3 ...
 $ hours_fish    : num  0.5 0.5 1 1 1 0.5 2 0.5 3 4 ...
 $ ang_hours     : num  1.5 0.5 4 1 2 1.5 16 0.5 6 12 ...
 $ LRVS_cpue     : num  0 0 0 0 0 0 0 0 0 0 ...

Thank you

JADE MAGGS
Assistant Scientist

South African Association for Marine Biological Research
Direct Tel: +27 (31) 328 8171   Fax: +27 (31) 328 8188
E-mail: jmaggs at ori.org.za
1 King Shaka Avenue, Point, Durban 4001 KwaZulu-Natal South Africa
PO Box 10712, Marine Parade 4056 KwaZulu-Natal South Africa



 "Please consider your environment responsibly before printing this e-mail"


-----Original Message-----
From: ONKELINX, Thierry [mailto:Thierry.ONKELINX at inbo.be] 
Sent: 16 April 2013 20:13
To: Jade Maggs
Cc: r-sig-ecology at r-project.org
Subject: RE: [R-sig-eco] Subset dataframe

Please keep the mailing list always in cc when replying.

Can you send us the output of str(cpueData)

ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25
1070 Anderlecht
Belgium
+ 32 2 525 02 51
+ 32 54 43 61 85
Thierry.Onkelinx at inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey

-----Oorspronkelijk bericht-----
Van: Jade Maggs [mailto:jmaggs at ori.org.za]
Verzonden: dinsdag 16 april 2013 16:17
Aan: ONKELINX, Thierry
Onderwerp: RE: [R-sig-eco] Subset dataframe

Thank you for your help. I tried the dput() function without success. I have now attached the output from write.table().

I used the code exactly as you suggested:

cpueData1 <-
  cpueData[ave(cpueData$LRVS_cpue, cpueData$outingID, FUN=max) == cpueData$LRVS_cpue,]

but received the following error message:

Error in split.default(x, g) : Group length is 0 but data length > 0

Thank you again very much for your help.

JADE MAGGS
Assistant Scientist

South African Association for Marine Biological Research
Direct Tel: +27 (31) 328 8171   Fax: +27 (31) 328 8188
E-mail: jmaggs at ori.org.za
1 King Shaka Avenue, Point, Durban 4001 KwaZulu-Natal South Africa PO Box 10712, Marine Parade 4056 KwaZulu-Natal South Africa



P "Please consider your environment responsibly before printing this e-mail"


-----Original Message-----
From: ONKELINX, Thierry [mailto:Thierry.ONKELINX at inbo.be]
Sent: 16 April 2013 15:20
To: Jade Maggs; r-sig-ecology at r-project.org
Subject: RE: [R-sig-eco] Subset dataframe

Something like this?

cpueData[ave(cpueData$LRVS_cpue, cpueData$outingID, FUN = max) == cpueData$LRVS_cpue, ]

untest since you didn't provide some easy to copy-and-paste dataset. Use the output of dput(sample.data.frame) to provide sample data

ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25
1070 Anderlecht
Belgium
+ 32 2 525 02 51
+ 32 54 43 61 85
Thierry.Onkelinx at inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey

-----Oorspronkelijk bericht-----
Van: r-sig-ecology-bounces at r-project.org [mailto:r-sig-ecology-bounces at r-project.org] Namens Jade Maggs
Verzonden: dinsdag 16 april 2013 14:13
Aan: r-sig-ecology at r-project.org
Onderwerp: [R-sig-eco] Subset dataframe

Hi list, I need to subset the dataframe below by selecting rows with maximum LRVS_cpue values for each outing_ID. For example, where outing_ID == 51801, the new dataframe should have only one row with LRVS_cpue = 0.5. LRVS_cpue in all other rows should remain as 0. I have over 650 000 rows, so looping is very slow.



I have tried: >cpueData1 <-
data.frame(unique(cpueData[max(cpueData$LRVS_cpue),])) but this does not work.



Any help would be greatly appreciated.






patrol_ID

outing_ID

num_anglers

hours_fish

ang_hours

LRVS_cpue


51709

51795

2

3.5

7

0


51709

51796

1

0.5

0.5

0


51709

51797

1

1

1

0


51709

51798

1

2

2

0


51709

51799

5

5.5

27.5

0


51709

51800

1

3

3

0


51709

51801

2

1

2

0


51709

51801

2

1

2

0.5


51709

51802

1

1.5

1.5

0


51709

51803

3

1

3

0


51709

51804

4

1

4

0



JADE MAGGS

Assistant Scientist


        [[alternative HTML version deleted]]

_______________________________________________
R-sig-ecology mailing list
R-sig-ecology at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
* * * * * * * * * * * * * D I S C L A I M E R * * * * * * * * * * * * * Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document.
The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document.
* * * * * * * * * * * * * D I S C L A I M E R * * * * * * * * * * * * * Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document.
The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document.



More information about the R-sig-ecology mailing list