[R-sig-ME] Regression analysis with small but complete dataset (fully representing reality)?

Diana Michl d|@n@m|ch| @end|ng |rom @|kq@de
Mon Dec 28 13:17:42 CET 2020


Hi all,

sorry it took me a while to respond, the holidays... Thanks very much 
for your help and suggestions!

@Pat: Right, I get it. The data is completely observed and the missing 
data not structural. I mostly get what you're saying about not needing 
inferential statistics. I thought, though, that they give information 
about relationships between variables which descriptive statistics just 
can't. Like, descriptive stats can tell me means, iqrs, maybe frequency 
distributions - but regressions can show how some variables /predict/ 
others. Or correlations show how (strongly) variables relate to one 
another and whether that's likely significant or random. I could really 
use methods that can do that. But if it's not possible with a dataset 
such as mine, then that's the way it is.

@Sree: Maybe partial least squares is what I'm looking for! I've never 
done this or heard of it. Is it much like ordinary least squares?
Thanks very much for the link, I'll look into it. I'll see how far I get 
and will gladly get back to you once I'm there. It will take a few days. 
My data sounds similar to yours indeed, except the set never represents 
less than about 70% of all existing cases.

Best

Diana


Am 26.12.2020 um 07:27 schrieb sree datta:
> Hi Diana
>
> In addition to using descriptive statistics, I would also recommend 
> using Partial Least Squares regression that was specifically designed 
> for the problem of small sample size and having many variables. (your 
> dependent can be continuous, binary or multinomial in PLS). I have 
> successfully used PLS regression in medical / healthcare arena for 
> rare and orphan disease analyses where the affected population is very 
> small and getting data from 30 patients represents any where from 25% 
> to 60% of the overall population.
>
> I strongly recommend this excellent resource (a detailed PDF document 
> - 235 pages)  by Gaston Sanchez on his website: 
> https://www.gastonsanchez.com/PLS_Path_Modeling_with_R.pdf 
> <https://www.gastonsanchez.com/PLS_Path_Modeling_with_R.pdf>
>
> Hope this helps. If you have any questions or need additional 
> information please get back to me and I can help you in identifying 
> whether PLS regression would be relevant and helpful for you.
>
> Sree
>
>
>
> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=icon> 
> 	Virus-free. www.avast.com 
> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=link> 
>
>
>
> On Fri, Dec 25, 2020 at 12:08 PM Patrick (Malone Quantitative) 
> <malone using malonequantitative.com <mailto:malone using malonequantitative.com>> 
> wrote:
>
>     Diana,
>
>     cc'ing the list again in case anyone else has input
>
>     I was asking if the missing was structural--for example, hours per
>     shift if
>     someone is unemployed at the time of measurement. In that
>     scenario, you
>     could have missing "values" but still completely observed *data*.
>
>     Normally, I would assume that questions about missing data refer to
>     incomplete observation, but you clearly have a special situation,
>     which is
>     why I asked.
>
>     If your population data is completely observed, again, you don't need
>     inferential statistics.
>
>     If not, you do indeed have a sample of the data, not the
>     population, even
>     though you have most of it. I believe there are corrections that
>     need to be
>     made to inferential statistics for small populations. I don't have
>     experience with that, but that might get you started.
>
>     Pat
>
>     On Fri, Dec 25, 2020 at 9:55 AM Diana Michl <dianamichl using aikq.de
>     <mailto:dianamichl using aikq.de>> wrote:
>
>     > Hi Pat,
>     >
>     > thanks very much for your help! Helps me see things a bit more
>     clearly.
>     > Well, the present values aren't the only ones that could exist.
>     There are
>     > questions like "How long is your shift", which could be 3, 4, or
>     5 hours;
>     > "How many shifts per week do you have", which could be between 1
>     and 7, or
>     > "how many callers do you have per semester" which could be - in
>     theory -
>     > between 0 and thousands. Of course, there's only one response to
>     every
>     > question that's actually true.
>     > (Maybe I'm misunderstanding your question, though, cause you
>     probably
>     > didn't mean whether there could be only one possible response to
>     every
>     > question, right?)
>     >
>     > Diana
>     >
>     >
>     > Am 24.12.2020 um 17:22 schrieb Patrick (Malone Quantitative):
>     >
>     > Diana,
>     >
>     > It depends on the nature of the missing. Are the present values
>     the only
>     > ones that could exist? If so, you have the entire population's
>     data, and
>     > descriptive statistics are in fact preferable to inferential
>     ones. There's
>     > no need to run inferential statistics if you have the
>     population--they are
>     > by definition for inferring population values from a sample.
>     >
>     > Pat
>     >
>     > On Thu, Dec 24, 2020 at 6:21 AM Diana Michl <dianamichl using aikq.de
>     <mailto:dianamichl using aikq.de>> wrote:
>     >
>     >> I have a repeated measures design with about 16 cases and 5-6
>     points of
>     >> measuring. Sometimes, 1-4 full cases or some points of measure are
>     >> missing. (The measures are 20 numerical and categorical data
>     taken from
>     >> questionnaires.)
>     >>
>     >> The clue is: It's a small dataset with holes in it, but the 16
>     cases are
>     >> all that even exist. So they fully represent reality wherever
>     they're
>     >> complete.
>     >>
>     >> I wanted to run logistic regressions with up to 6 predictors.
>     But can I
>     >> do that? I know about the many problems such small datasets
>     have for
>     >> regression analysis - but do they matter as much if there
>     aren't any
>     >> more cases in reality?
>     >> Are descriptive analyses the only ones I can use?
>     >>
>     >> Many thanks
>     >>
>     >> --
>     >> Dr. Diana Michl
>     >> #www.diana-michl.de <http://www.diana-michl.de>
>     >>
>     >> #Film: Der unberührte Garten - eine ungewöhnliche Geschichte übers
>     >> Erwachsenwerden (www.vimeo.com/148014360
>     <http://www.vimeo.com/148014360>)
>     >>
>     >> #Musik: Singer-Songwriter (www.youtube.com/user/ghiaghiafy
>     <http://www.youtube.com/user/ghiaghiafy>)
>     >>
>     >>
>     >>         [[alternative HTML version deleted]]
>     >>
>     >> _______________________________________________
>     >> R-sig-mixed-models using r-project.org
>     <mailto:R-sig-mixed-models using r-project.org> mailing list
>     >> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>     <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
>     >>
>     >
>     >
>     > --
>     > Patrick S. Malone, Ph.D., Malone Quantitative
>     > NEW Service Models: http://malonequantitative.com
>     <http://malonequantitative.com>
>     >
>     > He/Him/His
>     >
>     > --
>     > Dr. Diana Michl
>     > Kastanienallee 4
>     > 14471 Potsdam
>     > Tel: 0331 – 27 34 15 10
>     > 01577 – 3065650
>     > dianamichl using aikq.de <mailto:dianamichl using aikq.de>
>     >
>     > #www.diana-michl.de <http://www.diana-michl.de>
>     >
>     > #Film: Der unberührte Garten - eine ungewöhnliche Geschichte übers
>     > Erwachsenwerden (www.vimeo.com/148014360
>     <http://www.vimeo.com/148014360>)
>     >
>     > #Musik: Singer-Songwriter (www.youtube.com/user/ghiaghiafy
>     <http://www.youtube.com/user/ghiaghiafy>)
>     >
>
>
>     -- 
>     Patrick S. Malone, Ph.D., Malone Quantitative
>     NEW Service Models: http://malonequantitative.com
>     <http://malonequantitative.com>
>
>     He/Him/His
>
>             [[alternative HTML version deleted]]
>
>     _______________________________________________
>     R-sig-mixed-models using r-project.org
>     <mailto:R-sig-mixed-models using r-project.org> mailing list
>     https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>     <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
>
-- 
Dr. Diana Michl
Kastanienallee 4
14471 Potsdam
Tel: 0331 – 27 34 15 10
01577 – 3065650
dianamichl using aikq.de

#www.diana-michl.de

#Film: Der unberührte Garten - eine ungewöhnliche Geschichte übers 
Erwachsenwerden (www.vimeo.com/148014360)

#Musik: Singer-Songwriter (www.youtube.com/user/ghiaghiafy)


	[[alternative HTML version deleted]]



More information about the R-sig-mixed-models mailing list