1 Data Summary

income <- read.csv("https://ecoleman451.github.io/website/Data%20Visualization/Datasets/income_per_person.csv")

life <- read.csv("https://ecoleman451.github.io/website/Data%20Visualization/Datasets/life_expectancy_years.csv")

# Reshape data set such that there are only three columns (Geo, Year, & Income)
new_income <- pivot_longer(income, cols = -geo, names_to = "year", values_to = "income")

new_life <- pivot_longer(life, cols = -geo, names_to = "year", values_to = "life.expectancy")

## Create new data set
LifeExpIncom <- merge(new_life, new_income, by = c("geo", "year"))

## Read in More Data
country <- read.csv("https://ecoleman451.github.io/website/Data%20Visualization/Datasets/countries_total.csv")

pop <- read.csv("https://ecoleman451.github.io/website/Data%20Visualization/Datasets/population_total.csv")

new_pop <- pivot_longer(pop, cols = -geo, names_to = "year", values_to = "population")

## Merge LifeExpIncom with Country
merged <- merge(LifeExpIncom, country, by.x = "geo", by.y = "name", all.x = TRUE)

## Merge Population with Merged Data
fin_data <- merge(new_pop, merged, by = c("geo", "year"), all.x = TRUE)

## Get Data for Year 2000
final_data <- subset(fin_data, year =="X2000")

We first read in two datasets called “income” and “life,” which represent income and life expectancy values over many years. “Income” has 193 observations with 220 total variables, while “Life” has 187 observations and 220 total variables. Next, we reshape both datasets to have only three columns: Geo, Year, and Income or Life Expectancy. We then merge these reshaped sets into a dataset called “LifeExpIncome,” which now contains Geo, Year, Income, and Life Expectancy (40953 observations and 4 variables). Next, we read in two more datasets: “country” (240 observations and 11 variables) and “pop” (195 observations and 220 variables), representing country and population data, respectively. We reshape “pop” to align with “LifeExpIncome” and “Country,” which already have Year transformed into a single column. After this, we merge “LifeExpIncome” with “Country” and then merge this newly combined set with the reshaped “pop” set, creating a dataset called “fin_data” (42705 observations and 15 variables). Finally, we subset the data to focus only on data from the year 2000, resulting in our “final_data” set (195 observations and 15 variables):

2 GGPlot

The scatter plot below shows the relationship between income, life expectancy, and population size across different regions in the year 2000. Each point represents a country, with the size of the points corresponding to the population size of that specific region. The countries are color-coded for better visualization.

scatter_pop <- ggplot(final_data, aes(x = life.expectancy, y = income, color = region, size = population)) +
  geom_point() +
  labs(title = "Life Expectancy vs. Income per Region (2000)",
       x = "Life Expectancy",
       y = "Income",
       size = "Population",
       color = "Region")
scatter_pop

From the plot, we observe a slightly positive correlation between income and life expectancy. It indicates that countries with higher incomes are likely to have longer life expectancies. Additionally, countries in the Americas and Asia tend to have larger populations, as indicated by the larger point sizes. This also suggests that countries with higher populations might have longer life expectancies. European countries appear to have the longest life expectancies, with most of their points on the far right side of the graph, although their populations are not as large as those of other regions. Next, we subset the data to focus on the year 2015, resulting in our “final_data” set (195 observations and 15 variables). Now, let’s examine the overall summary statistics for the dataset “fin_data,” which includes data from all years, not just 2015.

## Get Data for Year 2015
final_data <- subset(fin_data, year =="X2015")

3 Plotly

The plot below shows the relationship between income, life expectancy, and population size across different regions over several years. Each point represents a country, with the size of the points corresponding to the population size of that specific country. The countries are color-coded by region for better visualization. To make the plot more visually appealing, we’ve applied a transformation to the population size using a logarithmic function. This transformation compresses the range of population sizes, reducing the size of the points and making the plot clearer and easier to interpret. Additionally, the x-axis uses a logarithmic scale to better visualize the wide range of income values. The plot is animated to show how these relationships change over time, providing a dynamic view of global trends in income, life expectancy, and population:

pal.IBM <- c("#332288", "#117733", "#0072B2","#D55E00", "#882255")
pal.IBM <- setNames(pal.IBM, c("Asia", "Europe", "Africa", "Americas", "Oceania"))

# Ensure no NA values in the region column
final_data <- final_data %>%
  filter(!is.na(region))  # Remove rows with NA in the region column

# Filter data to remove NA values and convert year to numeric
final_data$year <- as.numeric(gsub("X", "", final_data$year))
final_data <- final_data %>%
  filter(!is.na(life.expectancy) & !is.na(income) & !is.na(population))

fig <- final_data %>%
  plot_ly(
    x = ~income, 
    y = ~life.expectancy, 
    size = ~(2*log(population)-11)^2,
    color = ~region, 
    colors = pal.IBM,   # custom colors
    frame = ~year,      # the time variable to
    text = ~paste("Country:", geo,
                  "<br>Region:", region,
                  "<br>Year:", year,
                  "<br>Life Expectancy:", life.expectancy,
                  "<br>Population:", population,
                  "<br>Income per Person:", income),
    hoverinfo = "text",
    type = 'scatter',
    mode = 'markers'
  )
fig <- fig %>% layout(
    xaxis = list(
      type = "log"
    ),
    title = "Income vs. Life Expectancy Over Time",
    xaxis = list(title = "Income per Person (Log Scale)"),
    yaxis = list(title = "Life Expectancy")
  )

fig

The x-axis represents the income levels for each country, with higher incomes positioned further to the right. The y-axis represents life expectancy, with higher life expectancies positioned higher on the axis. From the plot, we can observe that some countries dominate the scatter plot due to their larger population sizes and higher incomes. This visualization allows us to analyze whether countries with higher incomes generally have longer life expectancies. Additionally, we can examine whether there is a correlation between population size and income levels, helping to identify trends and patterns in the data.

In the animated plot, each frame corresponds to a different year, showing how the relationship between income, life expectancy, and population size evolves over time. The size of each point is determined by the population of the country, with larger points indicating larger populations. The color of the points indicates the region to which the country belongs, allowing us to see regional trends and differences more clearly. By observing the animation, we can identify how economic and health outcomes have changed across different regions and time periods, providing insights into global development patterns.

