我用了 推特 收集2017年草皮草社区推文的软件包。我从 鲍勃·鲁迪斯(Bob Rudis)关于rtweet的博客文章.

在这一年年底,我花了一些时间,充满了好奇心,所以我终于尝试了。我想知道是否可以找到哪个草皮草行业帐户影响最大或影响最大。为了弄清楚这一点,我查看了以下数据:

  • 追随者人数
  • 帐户发推的频率(发推率)
  • 别人提到该帐户多少次
  • 从该帐户发送的多少条推文中有大量转发
  • 从该帐户发送的多少条推文中有大量的收藏夹
  • 帐户的总体排名,基于这些类别中的排名组合

计算的结果可以在此搜索和排序 草坪草Twitter 2017 闪亮的应用程序。

我使用的代码在 turf_twitter_2017 的GitHub上的存储库,因此您可以确切地看到我所做的。

使用这些标准选择了我在此分析中研究的6,271个帐户“草皮推特”。

首先,我得到了七个草皮科学家的追随者名单。我想研究对草皮草的新发展感兴趣的帐户,并且我发现几乎每个对此主题感兴趣的人都将至少关注其中一个帐户。

# get id of accounts following 
atc <- get_followers("asianturfgrass", n = 20000, retryonratelimit = TRUE)
pace <- get_followers("paceturf", n = 20000, retryonratelimit = TRUE)
unl <- get_followers("unlturf", n = 20000, retryonratelimit = TRUE)
tomy <- get_followers("striturf_tomy", n = 20000, retryonratelimit = TRUE)
jyri <- get_followers("Amplify_Turf", n = 20000, retryonratelimit = TRUE)
jk <- get_followers("iTweetTurf", n = 20000, retryonratelimit = TRUE)
unruh <- get_followers("jbunruh", n = 20000, retryonratelimit = TRUE)

然后,我检查了在先前搜索中确定的帐户是否也遵循了高尔夫或运动草皮行业协会之一。

# get accounts following industry organizations, which I'll use as a confirmation
# that the account is probably interested in turf
gcsaa <- get_followers("gcsaa", n = 20000, retryonratelimit = TRUE)
bigga <- get_followers("BIGGAltd", n = 20000, retryonratelimit = TRUE)
canada <- get_followers("GolfSupers", n = 20000, retryonratelimit = TRUE)
agcsa <- get_followers("AGCSA2", n = 20000, retryonratelimit = TRUE)
iog <- get_followers("the_iog", n = 20000, retryonratelimit = TRUE)
stma <- get_followers("FieldExperts", n = 20000, retryonratelimit = TRUE)

我确定了每个列表中的唯一帐户,然后对两个列表进行了相交,以识别至少跟随一位草皮科学家的帐户 至少跟随了一个行业协会。

# that's a lot of accounts, 和 many are duplicates
# to get the accounts to check, first select all those that are following one of the
# turf scientists

followers <- unique(rbind.data.frame(atc, pace, unl, tomy, jyri, jk, unruh))

# make list of unique accounts following the associations
assocations <- unique(rbind.data.frame(gcsaa, bigga, canada, agcsa, iog, stma))

# use the intersect function to select only those accounts that are in both follow lists
followers_assoc <- as.data.frame(base::intersect(followers$user_id, assocations$user_id))

colnames(followers_assoc) <- "user_id"
followers_assoc$user_id <- as.character(followers_assoc$user_id)

然后,我删除了设置为私有的帐户。我还削减了少于50条推文的内容。而且我删除了自己追踪10,000个或更多帐户的帐户,因为这些帐户似乎可以自动增加关注者。

# get basic data on these accounts
turf_follower <- lookup_users(followers_assoc$user_id)

# this removes the private accounts, also the least active ones
turf_follower2 <- subset(turf_follower, protected == FALSE & statuses_count >= 50)

# this attempts to remove those that seem to have automated follow/follower system
# in order to get a large audience, few removed here are turf related
turf_follower3 <- subset(turf_follower2, friends_count < 1e4)

剩下的6,271个需要进一步分析。