Mark's Logs

Reading, thinking, and seeing.

Data Mashups in R

| Comments

Ch.I Mapping Foreclosures

  • download

      download.file(url="URL", destfile="DEST")
    
  • regex

1
2
grep()
gsub()
  • Yahoo!’s Latitude and Longitude service: sign up

  • Parse XML

1
2
3
4
5
6
install.packages("XML")
library("XML")
xmlTreeParse(requestURL, isURL=T)

install.package("RCurl")
library("RCurl")
  • Proxy

      Sys.setenv("http_proxy" = "http://username:passwd@host:port")
    
  • Magic str()

    it is good practice to closely examine each package’s data structures using str()

  • Exception handling

1
2
3
4
5
6
tryCatch({
    xmlResult <- xmlTreeParse(requestURL, isURL=TRUE, addAttributeNamespaces=TRUE)
    #...other code...
    }, error=function(err){
        cat("xml parsing or http error:", conditionMessage(err), "\n")
})
1
2
library(PBSmapping)
plotPolys()
  • Exploring Data Structures
1
2
as.numeric()
level()
  • Color
1
head.colors()

Ch.II Statistics of Foreclosure

  • Data is available at FactFinder

  • skip lines when reading from file

1
read.table("FILE", skip=1, na.string="")
  • Descriptive Statistics: mean(), median(), sd(), cor(), summary().

  • lattice

      library(lattice)
      install.packages(latticsExtra)
      library(latticeExtra)
    
    • plot: stripplot() + bwplot()
1
2
3
4
5
6
7
8
9
print(stripplot(IncomeLevels ~ jitter(ct$FCs),
    main=list(
    "Foreclures grouped by National Median Household Income", cex=1),
    sub=list("Greater or less than $50,000", cex=1),
    xlab="foreclosures",
    ylab="household median income",
    aspect=.3, col="lightblue", pch=2 ) +
as.layer(bwplot(IncomeLevels ~ ct$FCs, varwidth=TRUE, box.ratio=0.4, col="blue", pch="|"))
)
  • Correlation

    In R, we can create multidimensional correlation graphs using the pairs() scatterplot matrix package.


Comments