[BioC] flowCore 1.22.0 broken for some FCS files

Josef Spidlen jspidlen at bccrc.ca
Tue Jun 19 00:00:57 CEST 2012


Hi Mike,
I agree that empty keyword values are illegal according to the FCS data 
file standard. Unfortunately, there are several vendors breaking this 
rule (e.g., CELLQuest/FACSCalibur, Partec, Applied Biosystems / Attune). 
Consequently, I agree with Kieran that it would be better if flowCore 
"closed one eye" and allowed reading of those files.

Technically, I believe it can still be done while being able to 
distinguish whether the <delimiter_char> is an actual delimiter or part 
of the keyword value. When starting to read a keyword value, your parser 
could distinguish the following states.

The stream with the keyword value right after reading the initiating 
<delimiter_char> starts with:

1) <delimiter_char><delimiter_char>
means that the actual keyword value starts with <delimiter_char>
For example: "|$COM||| Delimiter starts my comment|"
(| is the <delimiter_char> in my examples)

2) <delimiter_char>x
where x is not a <delimiter_char> means that the vendor broke the 
standard and saved a keyword with an empty value.
For example: "|$COM||$CYT|Partec PAS|"
I know, this only works assuming that there are no keyword names that 
would include the <delimiter_char> as part of the name. I believe that 
this is a safe assumption after having seen many many FCS files. In the 
example, this "relaxed" interpretation would mean that there are two 
keywords, "$COM" (empty value) and "$CYT" (value "Partec PAS"). A strict 
FCS compatible implementation reads this as a single keyword named 
"$COM|$CYT" with a value of "Partec PAS".

3) x
where x is not a <delimiter_char> simply means that the keyword value is 
starting with character x.
For example: "|$COM|My comment|"

It goes down to the question whether it is a good practice to read 
broken files, which is essentially sending a message to vendors saying 
that it is OK to generate broken files. I hate that message but at the 
end, I think it is even more important to make users happy, which is why 
I would argue to change flowCore and make it more relaxed as described. 
FlowJo and some other tools took this path, which is greatly appreciated 
by their users.

Best regard,
Josef

Btw. A minor correction to Kieran's note from another email: I have been 
only involved in the FCS 3.1 revision but haven't been around in the 90s 
when the FCS 3.0 standard was developed :-)


On 12-06-15 03:00 AM, bioconductor-request at r-project.org wrote:
> Date: Thu, 14 Jun 2012 13:32:37 -0700
> From: "Jiang, Mike"<wjiang2 at fhcrc.org>
> To:<bioconductor at r-project.org>
> Subject: Re: [BioC] [Bioc-devel] flowCore 1.22.0 broken for some FCS
>          files   (which it previously read without errors)
> Message-ID:<D780EAC3ADA31F488BCA74ECCD5B717E0875FB79 at ISIS.fhcrc.org>
> Content-Type: text/plain
>
> Kieran,
>
> I looked at your FCS, it has empty keyword value which does not conform to FCS 3.0 standard:
> "3.2.9 Keywords and keyword values must have lengths greater than zero. "(http://murphylab.cbi.cmu.edu/FCSAPI/FCS3.html).
>
> Particularly, this occurs at $ENDSTEXT keyword-value pairs :"\\$ENDSTEXT\\\\$ETIM..."
> Which is "byte offset to end of the supplemental TEXT segment" and really shouldn't be empty (normally it is put as "0")
>
> And "\\" is used as delimiter here, FCS 3.0 allows delimiter appears in the keyword value or keyword name as long as it is " immediately followed by a second delimiter". So the characters "\\\\" after "$ENDSTEXT" keyword is misunderstood as part of "$ETIM" by the parser here, which further messed up the parsing of subsequent string. That is why the parser is reporting error.
>
> Originally,flowCore did not handle this delimiter issue properly. It might read FCS successfully with the incorrect keyword values without notifying the user. Now,we thought it may be helpful to throw the error and let user know the issue with the TEXT segment of FCS.
>
> I have attached the TEXT Segment of your FCS file.
>
> Let me know if you have questions.
>
> Thanks,
> Mike
>> >From: Kieran O'Neill<koneill at bccrc.ca>
>> >Subject: [Bioc-devel] flowCore 1.22.0 broken for some FCS files (which it previously read without errors)
>> >Date: June 13, 2012 3:53:17 PM PDT
>> >To:bioc-devel at r-project.org
>> >Hi all
>> >
>> >I just recently came back to a project I was previously working on,
>> >and found that the most recent version of flowCore, 1.22.0, no longer
>> >reads some of my FCS files (those generated by one instrument in
>> >particular).
>> >
>> >The error it gives is:
>> >
>> >Error in fcs_text_parse(txt) : ERROR! no end found
>> >
>> >Previous versions of flowCore had no trouble reading these files, and
>> >the current version seems to read most other FCS files I have from
>> >other instruments. However, since parsing FCS files into something
>> >usable in R is probably the most important functionality in the
>> >package, having it broken is rather bad.
>> >
>> >It is also quite frustrating for me, in that no previous version of
>> >flowCore works in the current version of R (2.15.0), so I would need
>> >to downgrade the whole of R in order to downgrade to a working version
>> >of flowCore to analyse these files.
>> >
>> >I would be happy to send a sample file for debugging if needed.
>> >
>> >Thanks,
>> >Kieran
>> >
>> >_______________________________________________
>> >Bioc-devel at r-project.org  mailing list
>> >https://stat.ethz.ch/mailman/listinfo/bioc-devel

-- 
Josef Spidlen, Ph.D.
Terry Fox Laboratory, BC Cancer Agency
675 West 10th Avenue, V5Z 1L3 Vancouver, BC, Canada
  
Tel: +1 (604) 675-8000 x 7755
http://www.terryfoxlab.ca/people/rbrinkman/josef.aspx



More information about the Bioconductor mailing list