[R] reading lisp file in R

peter dalgaard pdalgd at gmail.com
Thu Jan 18 18:18:07 CET 2018


Yes, and the structure is obviously case-insensitive. More troublesome is probably that there can be multiple ACADEMIC-EMPHASIS entries, which can be tricky to tidify. Also one would need to figure out what is the meaning of lines like

(DEFPROP BOSTON-COLLEGE0 T DUPLICATE)

-pd

> On 18 Jan 2018, at 18:04 , Barry Rowlingson <b.rowlingson at lancaster.ac.uk> wrote:
> 
> The file also has a bunch of email headers stuck in the middle of it:
> 
> 
> .....
> 
> (QUALITY-OF-LIFE SCALE:1-5 4)
>  (ACADEMIC-EMPHASIS HEALTH-SCIENCE)
> )
> -------
> -------
> 
> From LEBOWITZ at cs.columbia.edu Mon Feb 22 20:53:02 1988
> Received: from zodiac by meridian (5.52/4.7)
> Received: from Jessica.Stanford.EDU by ads.com (5.58/1.9)
>    id AA04539; Mon, 22 Feb 88 20:59:59 PST
> Received: from Portia.Stanford.EDU by jessica.Stanford.EDU with TCP; Mon,
> 22 Feb
> 88 20:58:22 PST
> Received: from columbia.edu (COLUMBIA.EDU.ARPA) by Portia.STANFORD.EDU
> (1.2/Ultrix2.0-B)
>    id AA11480; Mon, 22 Feb 88 20:49:53 pst
> Received: from CS.COLUMBIA.EDU by columbia.edu (5.54/1.14)
>    id AA10186; Mon, 22 Feb 88 23:48:44 EST
> Message-Id: <8802230448.AA10186 at columbia.edu>
> Date: Fri 22 Jan 88 02:50:00-EST
> From: The Mailer Daemon <Mailer at cs.columbia.edu>
> To: LEBOWITZ at cs.columbia.edu
> Subject: Message of 18-Jan-88 20:13:54
> Resent-Date: Mon 22 Feb 88 23:44:07-EST
> Resent-From: Michael Lebowitz <LEBOWITZ at cs.columbia.edu>
> Resent-To: souders at portia.stanford.edu
> Resent-Message-Id: <12376918538.25.LEBOWITZ at CS.COLUMBIA.EDU>
> Status: R
> 
> Message undeliverable and dequeued after 3 days:
> souders%meridian at ADS.ARPA: Cannot connect to host
>        ------------
> Date: Mon 18 Jan 88 20:13:54-EST
> From: Michael Lebowitz <LEBOWITZ at CS.COLUMBIA.EDU>
> Subject: bigger file part 3
> To: souders%meridian at ADS.ARPA
> In-Reply-To: <8801182147.AA08014 at ADS.ARPA>
> Message-ID: <12367705229.11.LEBOWITZ at CS.COLUMBIA.EDU>
> 
> (DEF-INSTANCE GEORGETOWN
>  (STATE MARYLAND)
>  (LOCATION URBAN)
>  (CONTROL PRIVATE)
>  (NO-OF-STUDENTS THOUS:10-15)
>  (MALE:FEMALE RATIO:45:55)
> ....
> 
> Which dates it to 1988. Nice.
> 
> Barry
> 
> 
> 
> On Thu, Jan 18, 2018 at 9:20 AM, Peter Crowther <peter.crowther at melandra.com
>> wrote:
> 
>> That's a nice example of why Lisp is both powerful and terrifying - you're
>> looking at a Lisp *program*, not just Lisp *data*, as Lisp makes no
>> distinction between the two.  You just read 'em in.
>> 
>> The two definitions at the bottom are function definitions.  The top one
>> defines the def-instance function.  Reading that indicates that it accepts
>> an atom as a name and a list of key-value or key-range-value lists as
>> properties, where they keys may be repeated to give you multi-valued
>> attributes in your result.  The bottom one defines a function for removing
>> duplicate entries of the same location.
>> 
>> The rest of the file (apart from the included email headers) is a whole
>> load of calls to the def-instance function.  In Lisp, you'd define the
>> functions, then just run the rest of the file.
>> 
>> To my knowledge, there is no generic way to read Lisp "data" into anything
>> else, because of this quirk that data can look like anything.  If anyone
>> can correct me on that, great, but I'd be somewhat surprised.  Therefore,
>> as David intimated, the tools you need are generic tools for handling text,
>> and you'll have to deal with the formatting yourself.  If I were doing a
>> one-off transform of this file, I'd probably reach for vi... but I'm an old
>> Unix hacker.  I certainly wouldn't teach that tooling.  awk or perl could
>> certainly handle it; or if you want to give students a wider view of the
>> world you might wish to try ANTLR and get them to write a grammar to parse
>> the file.  The Clojure grammar (
>> https://github.com/antlr/grammars-v4/blob/master/clojure/Clojure.g4) would
>> be an interesting place to start, although Terence Parr's comment of "match
>> a bunch of crap in parentheses" would probably give a flavour of what to
>> implement.  Depends what else the students are learning.
>> 
>> Hope this helps rather than hinders.
>> 
>> - Peter
>> 
>> On 18 January 2018 at 05:25, Ranjan Maitra <maitra at email.com> wrote:
>> 
>>> Thanks! I am trying to use it in R. (Actually, I try to give my students
>>> experiences with different kinds of files and I was wondering if there
>> were
>>> tools available for such kinds of files. I don't know Lisp so I do not
>>> actually know what the lines towards the bottom of the file mean.(
>>> 
>>> Many thanks for your response!
>>> 
>>> Best wishes,
>>> Ranjan
>>> 
>>> On Wed, 17 Jan 2018 20:59:48 -0800 David Winsemius <
>> dwinsemius at comcast.net>
>>> wrote:
>>> 
>>>> 
>>>>> On Jan 17, 2018, at 8:22 PM, Ranjan Maitra <maitra at email.com> wrote:
>>>>> 
>>>>> Dear friends,
>>>>> 
>>>>> Is there a way to read data files written in lisp into R?
>>>>> 
>>>>> Here is the file: https://archive.ics.uci.edu/
>>> ml/machine-learning-databases/university/university.data
>>>>> 
>>>>> I would like to read it into R. Any suggestions?
>>>> 
>>>> It's just a text file. What difficulties are you having?
>>>>> 
>>>>> 
>>>>> Thanks very much in advance for pointers on this and best wishes,
>>>>> Ranjan
>>>>> 
>>>>> --
>>>>> Important Notice: This mailbox is ignored: e-mails are set to be
>>> deleted on receipt. Please respond to the mailing list if appropriate.
>> For
>>> those needing to send personal or professional e-mail, please use
>>> appropriate addresses.
>>>>> 
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide http://www.R-project.org/
>>> posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>> 
>>>> David Winsemius
>>>> Alameda, CA, USA
>>>> 
>>>> 'Any technology distinguishable from magic is insufficiently advanced.'
>>> -Gehm's Corollary to Clarke's Third Law
>>>> 
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/
>>> posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>> 
>>> 
>>> 
>>> --
>>> Important Notice: This mailbox is ignored: e-mails are set to be deleted
>>> on receipt. Please respond to the mailing list if appropriate. For those
>>> needing to send personal or professional e-mail, please use appropriate
>>> addresses.
>>> 
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/
>>> posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>> 
>> 
>>        [[alternative HTML version deleted]]
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/
>> posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com



More information about the R-help mailing list