[R] Help with Binning Data

Bert Gunter bgunter.4567 at gmail.com
Fri Sep 11 00:33:54 CEST 2015


1. Posting in HTML largely negated your ability to provide data
through dput(). Folow he posting guide and post in PLAIN TEXT only,
please.

2. See ?cut  . I think this will at least get you started.

Cheers,
Bert
Bert Gunter

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
   -- Clifford Stoll


On Thu, Sep 10, 2015 at 3:28 PM, Shouro Dasgupta <shouro at gmail.com> wrote:
> Dear all,
>
> I have 3-hourly temperature data from 1970-2010 for 122 cities in the US. I
> would like to bin this data by city-year-week. My idea is if the
> temperature for a particular city in a given week falls within a given
> range (-17.78 & -12.22), (-12.22 & -6.67), ... (37.78 & 43.33), then the
> corresponding bin would have a value of 1 and 0 otherwise.
>
> The data looks like this. Basically, I need to generate a dummy variable
> for each temperature range. Any help will be greatly appreciated.
>
> tmp2<- dput(head(tmp1,10))
>> structure(list(yearday = c(1970001L, 1970001L, 1970001L, 1970001L,
>> 1970001L, 1970001L, 1970001L, 1970001L, 1970001L, 1970001L),
>>     City = structure(1:10, .Label = c("AKRON", "ALBANY", "ALBUQUERQUE",
>>     "ALLENTOWN", "ATLANTA", "AUSTIN", "BALTIMORE", "BATON ROUGE",
>>     "BERKELEY", "BIRMINGHAM", "BOISE", "BOSTON", "BRIDGEPORT",
>>     "BUFFALO", "CAMBRIDGE", "CAMDEN", "CANTON", "CHARLOTTE",
>>     "CHATTANOOGA", "CHICAGO", "CINCINNATI", "CLEVELAND", "COLORADO
>> SPRINGS",
>>     "COLUMBUS", "CORPUS CHRISTI", "DALLAS", "DAYTON", "DENVER",
>>     "DES MOINES", "DETROIT", "DULUTH", "EL PASO", "ELIZABETH",
>>     "ERIE", "EVANSVILLE", "FALL RIVER", "FLINT", "FORT WAYNE",
>>     "FRESNO", "FT WORTH", "GARY", "GLENDALE", "GRAND RAPIDS",
>>     "HARTFORD", "HONOLULU", "HOUSTON", "INDIANAPOLIS", "JACKSONVILLE",
>>     "JERSEY CITY", "KANSAS CITY", "KANSAS ITY", "KNOXVILLE",
>>     "Lansing ", "LAS VEGAS", "LEXINGTON", "LINCOLN", "LITTLE ROCK",
>>     "LONG BEACH", "LOS ANGELES", "LOUISVILLE", "LOWELL", "LYNN",
>>     "MADISON", "MEMPHIS", "MIAMI", "MILWAUKEE", "MINNEAPOLIS",
>>     "MOBILE", "MONTGOMERY", "NASHVILLE", "NEW BEDFORD", "NEW HAVEN",
>>     "NEW ORLEANS", "NEW YORK CITY", "NEWARK", "NORFOLK", "OAKLAND",
>>     "OGDEN", "OKLAHOMA CITY", "OMAHA", "PASADENA", "PATERSON",
>>     "PEORIA", "PHILADELPHIA", "PHOENIX", "PITTSBURG", "PORTLAND",
>>     "PROVIDENCE", "PUEBLO", "READING", "RICHMOND", "ROCHESTER",
>>     "ROCKFORD", "SACRAMENTO", "SALT LAKE CITY", "SAN ANTONIO",
>>     "SAN CRUZ", "SAN DIEGO", "SAN FRANCISCO", "SAN JOSE", "SAVANNAH",
>>     "SCHENECTADY", "SCRANTON", "SEATTLE", "SHREVEPORT", "SOMERVILLE",
>>     "SOUTH BEND", "SPOKANE", "SPRINGFIELD", "ST LOUIS", "ST PAUL",
>>     "ST PETERSBURG", "SYRACUSE", "TACOMA", "TAMPA", "TOLEDO",
>>     "TRENTON", "TUCSON", "TULSA", "UTICA", "WASHINGTON", "WATERBURY",
>>     "WICHITA", "WILMINGTON", "WORCESTER", "YONKERS", "YOUNGSTOWN"
>>     ), class = "factor"), cell_number = c(17379L, 17027L, 19514L,
>>     17745L, 20256L, 21323L, 18104L, 21329L, 18779L, 20254L),
>>     longitude = c(-81.519005, -73.756232, -106.609991, -75.490183,
>>     -84.387982, -97.743061, -76.612189, -91.14032, -121.635963,
>>     -86.80249), latitude = c(41.081445, 42.652579, 35.110703,
>>     40.608431, 33.748995, 30.267153, 39.290385, 30.458283, 37.871744,
>>     33.520661), State = structure(c(29L, 28L, 27L, 32L, 10L,
>>     35L, 19L, 17L, 4L, 1L), .Label = c(" ALA", " ARIZ", " ARK",
>>     " CAL", " COLO", " CONN", " DC", " DEL", " FLA", " GA", " HAWAII",
>>     " ILL", " IND", " IOWA", " KANS", " KY", " LA", " MASS",
>>     " MD", " MICH", " MINN", " MO", " NC", " NEBR", " NEV", " NJ",
>>     " NM", " NY", " OHIO", " OKLA", " ORE", " PA", " RI", " TENN",
>>     " TEX", " UTAH", " VA", " WASH", " WIS", "CAL", "CONN", "IDAH",
>>     "KY", "MASS"), class = "factor"), avsft = c(-7.81, -16.06,
>>     -7.71999999999997, -1.88999999999999, 2.90000000000003, 5.12,
>>     -5.02999999999997, 9.33000000000004, 15.08, 2.89000000000004
>>     ), year = c(1970L, 1970L, 1970L, 1970L, 1970L, 1970L, 1970L,
>>     1970L, 1970L, 1970L), day = c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
>>     1L, 1L, 1L), hour = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>>     0L), yearweek = c(197001L, 197001L, 197001L, 197001L, 197001L,
>>     197001L, 197001L, 197001L, 197001L, 197001L), week = c(1L,
>>     1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L)), .Names = c("yearday",
>> "City", "cell_number", "longitude", "latitude", "State", "avsft",
>> "year", "day", "hour", "yearweek", "week"), row.names = c(NA,
>> 10L), class = "data.frame")
>
>
> Sincerely,
>
> Shouro
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list