[BioC] Query regarding SomatiSignature bioconductor package
Julian Gehring
julian.gehring at embl.de
Wed Jun 11 05:30:51 CEST 2014
Hi Anand,
> 1.I have data from a single study (AML) with mutations obtained from 14 patients. In this case, how do I group the data ? If I group the data by ‘study’ as in vignette, I am getting an error while running nmfSignatures function.(I guess it’s because the dimension of matrix
> (sca_occurance) has only one column corresponding to the single study performed ) Can I group it based on patients (sampleNames) instead ?
You can group your variants by any variable that is present in the
'VRanges' object that contain your calls. The object behaves very
similar to a data frame, so you could add a column with
x$sample = ... ## your 14 samples ##
and than group it with
motifMatrix(x, group = "sample")
If your samples are already stored in the column 'sampleNames', you can
also refer to this (see '?mutationContext' for an example).
> 2.How do I choose the number R (number of signatures to obtain) ? I guess it should be less than number of columns of sca_occurances ? In a recent publication (Nicocolo Bolli et al , 2013, nat. com) involving single study (multiple myeloma with 52 patients), they mention - the have found two signatures, does it mean they have set the number of signatures (R argument in nmfSignatures()) to 2?
For estimating the number of signatures, there are several approaches.
If and how well they perform depends largely on the input data, none of
them will work reliably in all cases. For this reason, I haven't
implemented an estimation for the number of signatures so far - I want
to avoid giving a false sense of security/certainty.
For the practical aspect, most information will the contained in the
first few signatures - increasing the number of signatures further will
add little information. From a biological point of view, each signature
should result from a different mutation generating process. In your
setting with 14 patients suffering from the same type of cancer, one
would suspect a low number of such processes.
I hope this made things a bit clearer.
Best wishes
Julian
More information about the Bioconductor
mailing list