scholar: Analyse citation data from Google Scholar

Retrieving basic information

## Define the id for Richard Feynman
id <- 'B7vSqZsAAAAJ'

## Get his profile
get_profile(id)

Retrieving publications

get_publications() return a data.frame of publication records. It contains information of the publications, including title, author list, page number, citation number, publication year, etc..

The pubid is the article ID used by Google Scholar and the identifier that is used to retrieve the citation history of a selected publication.

## Get his publications (a large data frame)
p <- get_publications(id)
head(p, 3)

Retrieving citation data

## Get his citation history, i.e. citations to his work in a given year
ct <- get_citation_history(id)

## Plot citation trend
library(ggplot2)
ggplot(ct, aes(year, cites)) + geom_line() + geom_point()

Users can retrieve the citation history of a particular publication with get_article_cite_history().

## The following publication will be used to demonstrate article citation history
as.character(p$title[1])

## Get article citation history
ach <- get_article_cite_history(id, p$pubid[1])

## Plot citation trend
ggplot(ach, aes(year, cites)) +
    geom_segment(aes(xend = year, yend = 0), linewidth=1, color='darkgrey') +
    geom_point(size=3, color='firebrick')

Comparing scholars

You can compare the citation history of scholars by fetching data with compare_scholars.

# Compare Feynman and Stephen Hawking
ids <- c('B7vSqZsAAAAJ', 'DO5oG40AAAAJ')

# Get a data frame comparing the number of citations to their work in
# a given year
cs <- compare_scholars(ids)
## remove some 'bad' records without sufficient information
cs <- dplyr::filter(cs, !is.na(year) & year > 1900) 

ggplot(cs, aes(year, cites, group=name, color=name)) + 
  geom_line() + theme(legend.position="bottom")
## Compare their career trajectories, based on year of first citation
csc <- compare_scholar_careers(ids)
ggplot(csc, aes(career_year, cites, group=name, color=name)) + 
  geom_line() + geom_point() +
  theme(legend.position = "inside", 
    legend.position.inside=c(.2, .8)
  )

Visualizing and comparing network of coauthors

# Be careful with specifying too many coauthors as the visualization of the
# network can get very messy.
coauthor_network <- get_coauthors('DO5oG40AAAAJ', n_coauthors = 4)

coauthor_network

And then we have a built-in function to plot this visualization.

plot_coauthors(coauthor_network)

Note however, that these are the coauthors listed in Google Scholar profile and not coauthors from all publications.

Formatting publications for CV

The format_publications function can be used for example in conjunction with the vitae package to format publications in APA Style. The short name of the author of interest (e.g., of the person whose CV is being made) can be highlighted in bold with the author.name argument. The function after the pipe allows rmarkdown to format them properly, and the code chunk should be set to results = "asis".

APA style

format_publications("DO5oG40AAAAJ", "Guangchuang Yu") |> head() |> cat(sep='\n\n')

Numbering format

format_publications("DO5oG40AAAAJ", "Guangchuang Yu") |> head() |> print(quote=FALSE)