[R] select portion of text file using R
Duncan Mackay
dulcalma at bigpond.com
Tue Apr 28 03:30:00 CEST 2015
Hi Luigi
I think there may be problems with \t being equivalent to tab chr(9)
Therefore try
xlines <-
readLines(textConnection("* Block Type = Array Card Block
* Calibration Background is expired = No
* Calibration Background performed on = 2014-12-02 11:27:49 AM PST
* Calibration FAM is expired = No
* Calibration FAM performed on = 2014-12-02 12:00:20 PM PST
* Calibration ROI is expired = No
* Calibration ROI performed on = 2014-12-02 11:20:40 AM PST
* Calibration ROX is expired = No
* Calibration ROX performed on = 2014-12-02 12:11:21 PM PST
* Calibration Uniformity is expired = No
* Calibration Uniformity performed on = 2014-12-02 11:43:43 AM PST
* Calibration VIC is expired = No
* Calibration VIC performed on = 2014-12-02 11:51:59 AM PST
* Chemistry = TAQMAN
* Experiment Barcode =
* Experiment Comments =
* Experiment File Name = F:\2015-04-13 Gastro array 59 Luigi - plate 3.eds
* Experiment Name = 2015-04-13 171216
* Experiment Run End Time = 2015-04-13 18:07:57 PM PDT
* Experiment Type = Comparative C? (??C?)
* Experiment User Name =
* Instrument Name = 278882033
* Instrument Serial Number = 278882033
* Instrument Type = ViiA 7
* Passive Reference = ROX
* Quantification Cycle Method = Ct
* Signal Smoothing On = false
* Stage/ Cycle where Analysis is performed = Stage 3, Step 2
Well Cycle Target Name Rn
1 1 Adeno 1 0.82
1 2 Adeno 1 0.93
2 1 Adeno 2 0.78") )
xlines = sub("^\\*.*$","", xlines)
xlines = xlines[nchar(xlines)>0]
xlines = sub("^[[:space:]]+","", xlines)
xlines = xlines[-1]
datc = data.frame(do.call(rbind, lapply(xlines, function(x) unlist(strsplit(x, "[[:space:]]+")))))
names(datc) = c("Well","Cycle","Target","Name","Rn")
dat = datc
for (j in c(1,2,4,5)) dat[,j] = as.numeric(dat[,j])
Regards
Duncan
Duncan Mackay
Department of Agronomy and Soil Science
University of New England
Armidale NSW 2351
Email: home: mackay at northnet.com.au
-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Luigi Marongiu
Sent: Tuesday, 28 April 2015 07:20
To: Duncan Murdoch; r-help
Subject: Re: [R] select portion of text file using R
Dear Duncan,
thank you for your reply,
I tried to read the file using skip and nrows but it did not work.
Here i am pasting the code I wrote and the head of the file i need to
read. Probably the error is due to the fact that the column "well" has
duplication, but how can i add a row column with unique row names? How
can I overcome this error?
Best regards
Luigi
CODE
raw.data<-read.table(
mydata,
header=TRUE,
row.names=31,
dec=".",
sep="\t",
skip = 30,
nrows = 17281,
row.names = 1:17281
)
HEAD OF MYDATA
* Block Type = Array Card Block
* Calibration Background is expired = No
* Calibration Background performed on = 2014-12-02 11:27:49 AM PST
* Calibration FAM is expired = No
* Calibration FAM performed on = 2014-12-02 12:00:20 PM PST
* Calibration ROI is expired = No
* Calibration ROI performed on = 2014-12-02 11:20:40 AM PST
* Calibration ROX is expired = No
* Calibration ROX performed on = 2014-12-02 12:11:21 PM PST
* Calibration Uniformity is expired = No
* Calibration Uniformity performed on = 2014-12-02 11:43:43 AM PST
* Calibration VIC is expired = No
* Calibration VIC performed on = 2014-12-02 11:51:59 AM PST
* Chemistry = TAQMAN
* Experiment Barcode =
* Experiment Comments =
* Experiment File Name = F:\2015-04-13 Gastro array 59 Luigi - plate 3.eds
* Experiment Name = 2015-04-13 171216
* Experiment Run End Time = 2015-04-13 18:07:57 PM PDT
* Experiment Type = Comparative Cт (ΔΔCт)
* Experiment User Name =
* Instrument Name = 278882033
* Instrument Serial Number = 278882033
* Instrument Type = ViiA 7
* Passive Reference = ROX
* Quantification Cycle Method = Ct
* Signal Smoothing On = false
* Stage/ Cycle where Analysis is performed = Stage 3, Step 2
[Amplification Data]
Well \tCycle \tTarget \tName \tRn
\t1 \t1 \tAdeno 1 \t0.82
\t1 \t2 \tAdeno 1\ \t0.93
...
\t2 \t1 \tAdeno 2 \t0.78
...
On Mon, Apr 20, 2015 at 12:17 PM, Duncan Murdoch
<murdoch.duncan at gmail.com> wrote:
> On 20/04/2015 3:28 AM, Luigi Marongiu wrote:
>> Dear all,
>> I have a flat file (tab delimited) derived from an excel file which is
>> subdivided in different parts: a first part is reporting metadata,
>> then there is a first spreadsheet indicated by [ ], then the actual
>> data and the second spreadsheet with the same format [ ] and then the
>> data.
>> How can I import such file using for instance read.table()?
>
> read.table() by itself can't recognize where the data starts, but it has
> arguments "skip" and "nrows" to control how much gets read. If you
> don't know the values for those arguments, you can use readLines() to
> read the entire file, then use grep() to recognize your table data, and
> either re-read the file, or just extract those lines and read from them
> as a textConnection.
>
> Duncan Murdoch
>
>> Many thanks
>> regards
>> Luigi
>>
>> Here is a sample of the file:
>> * Experiment Barcode =
>> * Experiment Comments =
>> * Experiment File Name = F:\array 59
>> * Experiment Name = 2015-04-13 171216
>> * Experiment Run End Time = 2015-04-13 18:07:57 PM PDT
>> ...
>> [Amplification Data]
>> Well Cycle Target Name Rn Delta Rn
>> 1 1 Adeno 1-Adeno 1 0.820 -0.051
>> 1 2 Adeno 1-Adeno 1 0.827 -0.042
>> 1 3 Adeno 1-Adeno 1 0.843 -0.025
>> 1 4 Adeno 1-Adeno 1 0.852 -0.015
>> 1 5 Adeno 1-Adeno 1 0.858 -0.008
>> 1 6 Adeno 1-Adeno 1 0.862 -0.002
>> ...
>> [Results]
>> Well Well Position Omit Sample Name Target Name Task
>> Reporter Quencher RQ RQ Min RQ Max CT Ct Mean Ct
>> SD Quantity Delta Ct Mean Delta Ct SD Delta Delta Ct
>> Automatic Ct Threshold Ct Threshold Automatic Baseline
>> Baseline Start Baseline End Efficiency Comments Custom1
>> Custom2 Custom3 Custom4 Custom5 Custom6 NOAMP
>> EXPFAIL
>> 1 A1 false P17 Adeno 1-Adeno 1 UNKNOWN FAM
>> NFQ-MGB Undetermined false
>> 0.200 true 3 44 1.000 N/A N
>> Y
>> 2 A2 false P17 Adeno 40/41 EH-AIQJCT3 UNKNOWN FAM
>> NFQ-MGB Undetermined
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list