[R-SIG-Finance] 4-digit SIC codes

G See gsee000 at gmail.com
Tue Feb 5 15:46:44 CET 2013


There are actually non-break spaces in the source code of the page.
If you look at it, you will see things like this:

<B>A/D  <BR>

Whether or not XML::trim gets rid of them for you may be OS specific.
See and answer to an old question of mine on R-help for example
https://stat.ethz.ch/pipermail/r-help/2012-February/302417.html

Best,
Garrett

On Tue, Feb 5, 2013 at 8:20 AM, David Reiner <David.Reiner at xrtrading.com> wrote:
> Very nice, Garrett!
> More curious than anything, but does anyone know why I get the extraneous characters when I do it?
> They are present in x as well. I believe they are non-breaking spaces.
>
>> head(SIC)
>   SICCode A/D  Office                                    Industry Title
> 4     100            5 Â                    AGRICULTURAL PRODUCTION-CROPS
> 5     200            5 Â AGRICULTURAL PROD-LIVESTOCK & ANIMAL SPECIALTIES
> 6     700            5 Â                            AGRICULTURAL SERVICES
> 7     800            5 Â                                         FORESTRY
> 8     900            5 Â                    FISHING, HUNTING AND TRAPPING
> 9    1000            9 Â                                     METAL MINING
>> sessionInfo()
> R version 2.15.2 (2012-10-26)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
> [4] LC_NUMERIC=C                           LC_TIME=English_United States.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] XML_3.95-0.1
>
> loaded via a namespace (and not attached):
> [1] tools_2.15.2
>
> Thanks,
> -- David Reiner
>
>
> -----Original Message-----
> From: r-sig-finance-bounces at r-project.org [mailto:r-sig-finance-bounces at r-project.org] On Behalf Of G See
> Sent: Monday, February 04, 2013 9:30 PM
> To: Bastian Offermann
> Cc: r-sig-finance at r-project.org
> Subject: Re: [R-SIG-Finance] 4-digit SIC codes
>
> I'm not sure, but here's a really quick and dirty way to get it
>
>> library(XML)
>> x <- readHTMLTable("http://www.sec.gov/info/edgar/siccodes.htm",
>                                   stringsAsFactors=FALSE)[[4]]
>> colnames(x) <- x[2, ]
>> SIC <- x[-c(1:3), ]
>> head(SIC)
>   SICCode A/D  Office                                    Industry Title
> 4     100           5                     AGRICULTURAL PRODUCTION-CROPS
> 5     200           5  AGRICULTURAL PROD-LIVESTOCK & ANIMAL SPECIALTIES
> 6     700           5                             AGRICULTURAL SERVICES
> 7     800           5                                          FORESTRY
> 8     900           5                     FISHING, HUNTING AND TRAPPING
> 9    1000           9                                      METAL MINING
>
>> SIC[SIC$SICCode == "2834", ]
>    SICCode A/D  Office               Industry Title
> 91    2834           1  PHARMACEUTICAL PREPARATIONS
>
> HTH,
> Garrett
>
> On Mon, Feb 4, 2013 at 9:19 PM, Bastian Offermann
> <bastian2507hk at yahoo.co.uk> wrote:
>> Hi,
>> does anybody know whether 4-digit SIC codes are available in R? Something
>> along the lines
>>
>> "2834" "Pharmaceutical Preparations"
>>
>> Thank you.
>>
>> _______________________________________________
>> R-SIG-Finance at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>> -- Subscriber-posting only. If you want to post, subscribe first.
>> -- Also note that this is not the r-help list where general R questions
>> should go.
>
> _______________________________________________
> R-SIG-Finance at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only. If you want to post, subscribe first.
> -- Also note that this is not the r-help list where general R questions should go.
>
>
> This e-mail and any materials attached hereto, including, without limitation, all content hereof and thereof (collectively, "XR Content") are confidential and proprietary to XR Trading, LLC ("XR") and/or its affiliates, and are protected by intellectual property laws.  Without the prior written consent of XR, the XR Content may not (i) be disclosed to any third party or (ii) be reproduced or otherwise used by anyone other than current employees of XR or its affiliates, on behalf of XR or its affiliates.
>
> THE XR CONTENT IS PROVIDED AS IS, WITHOUT REPRESENTATIONS OR WARRANTIES OF ANY KIND.  TO THE MAXIMUM EXTENT PERMISSIBLE UNDER APPLICABLE LAW, XR HEREBY DISCLAIMS ANY AND ALL WARRANTIES, EXPRESS AND IMPLIED, RELATING TO THE XR CONTENT, AND NEITHER XR NOR ANY OF ITS AFFILIATES SHALL IN ANY EVENT BE LIABLE FOR ANY DAMAGES OF ANY NATURE WHATSOEVER, INCLUDING, BUT NOT LIMITED TO, DIRECT, INDIRECT, CONSEQUENTIAL, SPECIAL AND PUNITIVE DAMAGES, LOSS OF PROFITS AND TRADING LOSSES, RESULTING FROM ANY PERSON'S USE OR RELIANCE UPON, OR INABILITY TO USE, ANY XR CONTENT, EVEN IF XR IS ADVISED OF THE POSSIBILITY OF SUCH DAMAGES OR IF SUCH DAMAGES WERE FORESEEABLE.

On Tue, Feb 5, 2013 at 8:20 AM, David Reiner <David.Reiner at xrtrading.com> wrote:
> Very nice, Garrett!
> More curious than anything, but does anyone know why I get the extraneous characters when I do it?
> They are present in x as well. I believe they are non-breaking spaces.
>
>> head(SIC)
>   SICCode A/D  Office                                    Industry Title
> 4     100            5 Â                    AGRICULTURAL PRODUCTION-CROPS
> 5     200            5 Â AGRICULTURAL PROD-LIVESTOCK & ANIMAL SPECIALTIES
> 6     700            5 Â                            AGRICULTURAL SERVICES
> 7     800            5 Â                                         FORESTRY
> 8     900            5 Â                    FISHING, HUNTING AND TRAPPING
> 9    1000            9 Â                                     METAL MINING
>> sessionInfo()
> R version 2.15.2 (2012-10-26)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
> [4] LC_NUMERIC=C                           LC_TIME=English_United States.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] XML_3.95-0.1
>
> loaded via a namespace (and not attached):
> [1] tools_2.15.2
>
> Thanks,
> -- David Reiner
>
>
> -----Original Message-----
> From: r-sig-finance-bounces at r-project.org [mailto:r-sig-finance-bounces at r-project.org] On Behalf Of G See
> Sent: Monday, February 04, 2013 9:30 PM
> To: Bastian Offermann
> Cc: r-sig-finance at r-project.org
> Subject: Re: [R-SIG-Finance] 4-digit SIC codes
>
> I'm not sure, but here's a really quick and dirty way to get it
>
>> library(XML)
>> x <- readHTMLTable("http://www.sec.gov/info/edgar/siccodes.htm",
>                                   stringsAsFactors=FALSE)[[4]]
>> colnames(x) <- x[2, ]
>> SIC <- x[-c(1:3), ]
>> head(SIC)
>   SICCode A/D  Office                                    Industry Title
> 4     100           5                     AGRICULTURAL PRODUCTION-CROPS
> 5     200           5  AGRICULTURAL PROD-LIVESTOCK & ANIMAL SPECIALTIES
> 6     700           5                             AGRICULTURAL SERVICES
> 7     800           5                                          FORESTRY
> 8     900           5                     FISHING, HUNTING AND TRAPPING
> 9    1000           9                                      METAL MINING
>
>> SIC[SIC$SICCode == "2834", ]
>    SICCode A/D  Office               Industry Title
> 91    2834           1  PHARMACEUTICAL PREPARATIONS
>
> HTH,
> Garrett
>
> On Mon, Feb 4, 2013 at 9:19 PM, Bastian Offermann
> <bastian2507hk at yahoo.co.uk> wrote:
>> Hi,
>> does anybody know whether 4-digit SIC codes are available in R? Something
>> along the lines
>>
>> "2834" "Pharmaceutical Preparations"
>>
>> Thank you.
>>
>> _______________________________________________
>> R-SIG-Finance at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>> -- Subscriber-posting only. If you want to post, subscribe first.
>> -- Also note that this is not the r-help list where general R questions
>> should go.
>
> _______________________________________________
> R-SIG-Finance at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only. If you want to post, subscribe first.
> -- Also note that this is not the r-help list where general R questions should go.
>
>
> This e-mail and any materials attached hereto, including, without limitation, all content hereof and thereof (collectively, "XR Content") are confidential and proprietary to XR Trading, LLC ("XR") and/or its affiliates, and are protected by intellectual property laws.  Without the prior written consent of XR, the XR Content may not (i) be disclosed to any third party or (ii) be reproduced or otherwise used by anyone other than current employees of XR or its affiliates, on behalf of XR or its affiliates.
>
> THE XR CONTENT IS PROVIDED AS IS, WITHOUT REPRESENTATIONS OR WARRANTIES OF ANY KIND.  TO THE MAXIMUM EXTENT PERMISSIBLE UNDER APPLICABLE LAW, XR HEREBY DISCLAIMS ANY AND ALL WARRANTIES, EXPRESS AND IMPLIED, RELATING TO THE XR CONTENT, AND NEITHER XR NOR ANY OF ITS AFFILIATES SHALL IN ANY EVENT BE LIABLE FOR ANY DAMAGES OF ANY NATURE WHATSOEVER, INCLUDING, BUT NOT LIMITED TO, DIRECT, INDIRECT, CONSEQUENTIAL, SPECIAL AND PUNITIVE DAMAGES, LOSS OF PROFITS AND TRADING LOSSES, RESULTING FROM ANY PERSON'S USE OR RELIANCE UPON, OR INABILITY TO USE, ANY XR CONTENT, EVEN IF XR IS ADVISED OF THE POSSIBILITY OF SUCH DAMAGES OR IF SUCH DAMAGES WERE FORESEEABLE.



More information about the R-SIG-Finance mailing list