Researchers Release Comprehensive Twitter Dataset of False Claims About The 2020 Election

Published on June 15, 2022

(Editors’ note: This article is co-published with Tech Policy Press.)

On Monday, June 13, the House Select Committee on the January 6 Attack on the U.S. Capitol hosted the second of its planned series of public hearings, focused on the Big Lie that the election was stolen from former President Donald Trump. The Chairman of the Committee, Rep. Bennie Thompson (D-MS), said in his opening statement on Monday that the Committee’s investigation has established that Trump “betrayed the trust of the American people. He ignored the will of the voters. He lied to his supporters and the country. And he tried to remain in office after the people had voted him out—and the courts upheld the will of the people.”

On the same day the Committee laid out evidence that Trump and his associates knew the election was lost even as they cynically pushed the Big Lie, a group of researchers from the University of Washington’s Center for an Informed Public and the Krebs Stamos Group* published a massive dataset of “misinformation, disinformation, and rumors spreading on Twitter about the 2020 U.S. election.” The dataset chronicles the role of key political elites, influencers and supporters of the President in advancing the Big Lie, exploring how key narratives spread on Twitter.

Published in the Journal of Quantitative Description, the paper accompanying the dataset is titled Repeat Spreaders and Election Delegitimization: A Comprehensive Dataset of Misinformation Tweets from the 2020 U.S. Election. The dataset, which the researchers named ElectionMisinfo2020, “is made up of over 49 million tweets connected to 456 distinct misinformation stories spread about the 2020 U.S. election between September 1, 2020 and December 15, 2020,” and it “focuses on false, misleading, exaggerated, or unsubstantiated claims or narratives related to voting, vote counting, and other election procedures.”

“President Trump and other pro-Trump elites in media and politics set an expectation of voter fraud and then eagerly amplified any and every claim about election issues, often with voter fraud framing,” said one of the lead researchers, Dr. Kate Starbird, an associate professor at the University of Washington and co-founder of the Center for an Informed Public. “But everyday people produced many of those claims.” The new report sheds light on how these claims proliferated from the margins to the nation’s Capitol.

The Nature of False Claims about the 2020 Election

The researchers collected 307 distinct “stories” or narratives, encompassing 44.8 million tweets, that they labeled as ‘sowing doubt’ in the 2020 election.** After a painstaking project to annotate and group the stories, the researchers were able to see patterns in “content groups,” or “collections of stories that have similar narrative or thematic components” representing the broadest categories of stories.

Notably, while “four out of the top five content groups were primarily spread by Trump-supporting accounts,” the researchers conclude, a set of allegations about possible U.S. Postal Service involvement in election fraud was primarily advanced by Biden supporters. But the overall proportion of misinformation was “highly skewed toward pro-Trump accounts” throughout the period.

The largest story in the dataset revolved around Dominion voting systems and claims its software “had systematically changed votes from candidate Trump to candidate Biden.” Spawned from reports of a single “mistake in the process of updating the software on vote tabulation computers” in Antrim County, Michigan, which was quickly corrected, the story was adopted by Donald Trump Jr., and then by President Trump, who tweeted about “dominion” 24 times between Nov. 6 and Dec. 15, 2020. “Dominion,” concludes the researchers, “was a prime example of an isolated incident which was reframed to suit the narrative that election fraud was systematic and widespread.”

Another prominent story is what came to be known as “Sharpiegate,” an example of the set of allegations in the ”partisan vote counting/recording” content group. Sharpiegate started “when voters in several polling locations,” mostly in Arizona, “noted that the Sharpie pens they had been given to vote were bleeding through the ballots—and some began to share concern (and later suspicion) that their votes had not been counted.” While the claims were wrong, Sharpiegate concerns, shared first by social media accounts with limited reach but soon spreading to larger accounts, exploded after Arizona was called for Biden. Later, claims on the account @CodeMonkeyZ, run by prominent QAnon figure Ron Watkins, pushed Sharpiegate further.

“SharpieGate”refers to false claims that Trump voters specifically were disenfranchised by being forced to vote with Sharpie pens that purportedly invalidated ballots.

Repeat Spreaders

The analysis was also able to “identify high profile accounts who had highly-retweeted tweets in several distinct misinformation stories—specifically among stories that functioned to sow doubt in election procedures or election results.” These “repeat spreaders” had a disproportionate impact on the total misinformation spread across the entire Twitter platform during the 2020 election, as measured by their spread of “multiple, distinct misinformation stories.” The top 35 “repeat spreaders” include far-right media entities and influencers, QAnon personalities, campaign advisors such as Rudy Giuliani, and the former president and his sons.

Rank	User Screen Name	Verified User	Stories With Large Tweet (>1000 RTs)	Large Tweets (>1000 RTs)	Number of Retweets	Stories With Any Tweet or Retweet
1	RealJamesWoods	Yes	24	30	363,349	29
2	gatewaypundit	Yes	21	85	408,586	38
3	TomFitton	Yes	19	28	140,259	25
4	JackPosobiec	Yes	18	42	165,274	35
5	EricTrump	Yes	17	28	463,353	26
6	realDonaldTrump	Yes	16	55	2,286,540	22
7	DonaldJTrumpJr	Yes	16	24	357,766	45
8	catturd2	No	15	22	75,290	24
9	prayingmedic	No	14	45	118,844	28
10	JamesOKeefeIII	No	13	54	452,749	15
11	ChuckCallesto	Yes	13	37	295,710	21
12	MichaelCoudrey	Yes	13	28	184,850	32
13	ANONYMIZED	No	12	33	71,300	16
14	robbystarbuck	Yes	11	17	78,707	44
15	stillgray	Yes	11	18	75,688	40
16	RichardGrenell	Yes	10	25	289,835	16
17	RealCandaceO	Yes	10	9	248,614	10
18	michellemalkin	Yes	10	28	87,237	18
19	scrowder	Yes	10	17	67,322	12
20	pnjaban	Yes	10	11	46,164	28
21	charliekirk11	Yes	9	28	394,231	12
22	RyanAFournier	Yes	9	10	107,962	32
23	PhillyGOP	No	9	9	36,650	17
24	joshdcaplan	Yes	9	9	30,696	18
25	johncardillo	Yes	9	9	24,726	39
26	RudyGiuliani	Yes	8	14	264,090	8
27	Project_Veritas	Yes	8	26	119,348	12
28	ScottAdamsSays	Yes	8	18	110,475	15
29	jsolomonReports	No	8	25	97,756	10
30	marklevinshow	Yes	8	16	96,395	8
31	seanmdav	Yes	8	10	67,669	42
32	Timcast	Yes	8	11	65,480	10
33	mschlapp	Yes	8	12	56,613	21
34	BreitbartNews	Yes	8	17	45,945	14
35	DiamondandSilk	Yes	8	11	44,071	14

Possible Interventions

The researchers conclude with observations on potential interventions platforms might consider, including addressing “repeat spreaders.” Since incentives on Twitter are “tied to follower interactions, lesser interventions, such as content labeling, are not likely to have a significant impact on the willingness of these accounts to interact with questionable content.” The researchers suggest that a more “fruitful” approach may be to enforce rules “more stringently” on repeat offenders.

This could include the implementation of “strike systems” presently in use on Twitter and YouTube, which impose escalating penalties on accounts for each rule violation to discourage repeat offenses, or a combination of approaches. So far, however, these strike systems appear not to have been sufficient to reduce the reach of repeat spreaders of misinformation: of the 35 repeat spreaders identified in the table above, “most continue to post on Twitter to a wide audience.” While seven of the accounts were suspended after the election, only two were removed for violations of Twitter’s policy on disputed election claims. (Notably, a number of Republican elites have continued to use Twitter to sow doubt in the 2020 election, with no apparent consequences.) More consistency in application, increased penalties, or improvements in social media companies’ capacities to limit the amplification of misinformation may be necessary to avoid repetition of these patterns in upcoming election cycles.

The Path to the Capitol

While the dataset only includes tweets through Dec.15, 2020 – notably before Trump’s Dec. 19 tweet announcing a “[b]ig protest in D.C. on January 6th” that kicked off a frenzy among his supporters – it does collect stories that relate to political violence. The researchers chronicled “118 unique stories in the broader ElectionMisinfo2020 dataset related to violence or threats of violence, split between the content groups intimidation (38), suppression (21), riots (18), discussions of a potential coup (16), protests (16), and discussions of civil war (9),” and posit that more work is necessary to understand the relationship between misinformation about elections and political violence.

On Capitol Hill, the January 6 Select Committee hearings have already made clear that Donald Trump, who not only generated his own false claims about the election but also embraced, endorsed, and amplified the false claims made by his supporters, played a key role in the incitement of the attack on Congress. “Mr. Chairman,” Select Committee Vice Chairwoman Liz Cheney, R-WY, stated in Monday’s hearing, “hundreds of our countrymen have faced criminal charges, many are serving criminal sentences because they believed what Donald Trump said about the election and they acted on it.”

Researcher Kate Starbird notes that Trump and his loyalists were very effective in creating a participatory mechanism designed to manufacture and reinforce false claims to sow doubt in the outcome of the election.

“One thing we should remember is that President Trump and his campaign were not only repeatedly sharing false claims that the election was or would be rigged,” said Starbird, “but they also encouraged his supporters to share evidence of voting issues — through the ’Army for Trump’ and ‘Defend Your Ballot’ initiatives. The Trump campaign provided the structure for participation in the ‘voter fraud’ disinformation campaign. In other words, they not only encouraged people to participate in supporting Trump, but they gave people a mechanism through which to participate — through sharing claims about voting issues and potential fraud.”

Looking Forward

While the January 6 Select Committee is due to complete its work later this year, the ElectionMisinfo2020 dataset will likely serve as a substantial building block for years of future research on phenomena at the intersection of social media, politics, and democracy.

A key question social media researchers seek to address is how to determine the role false claims and disinformation play in political violence. Starbird says while we all saw “hashtag-warriors come to life on the Capitol grounds and then swarming within the Capitol building,” it can be difficult to discern what sparked violence, or to disambiguate “organic” versus “coordinated” behavior. The Proud Boys who instigated the first confrontation with Capitol Police and used force to breach the building certainly bear substantial responsibility for the violence that day, but what encouragement did they feel knowing they had the crowd at their backs — both on the day and in the online spaces where they planned the attack?

The ElectionMisinfo2020 dataset of tweets, massive as it is, may also eventually be federated with other datasets drawn from other social media platforms during the 2020 election cycle. The complete contents of the social media site Parler were scraped by researchers, for instance, through January 2021, and some academic researchers had substantial access to Facebook to conduct research during the 2020 cycle. “I do think it would be very valuable to put some of the datasets together, but it can be methodologically challenging to combine data streams,” said Starbird. Such combinations could produce new insights.

Certainly, understanding how claims that delegitmize elections spread across platforms is urgent business, both for the Select Committee and for the platforms themselves. As the researchers note, according to one poll, “31% of the U.S. population still believe the ‘big lie’ that the election was stolen from then-President Trump.” And far right Republican candidates that have spread falsehoods about the 2020 election are advancing in campaigns for pivotal positions of power related to election administration. With another election cycle looming, it is crucial that insights on how to confront the Big Lie inform decisions taken by firms such as Twitter, Facebook, and YouTube and the recommendations the Select Committee makes to Congress, lest the next mob is too large to resist.

*Alex Stamos, the former Facebook Chief Security Officer who heads the Stanford Internet Observatory, started the Krebs Stamos Group with Christopher Krebs, the former director of the Department of Homeland Security’s (DHS) Cybersecurity and Infrastructure Security Agency (CISA) who was fired by Trump on November 17, 2020 for asserting that the 2020 election was “the most secure in American history.”

**The paper and data set built on the work of the Election Integrity Partnership (EIP), a collaboration of the University of Washington Center for an Informed Public; the Stanford Internet Observatory; the Atlantic Council’s digital forensics research unit, DFRLab; and Graphika, a firm that tracks disinformation. EIP monitored election misinformation in real time ahead of the 2020 election, identifying and tracking the spread of false claims.

Image: WASHINGTON, DC – JUNE 09: A tweet from former President Donald Trump is shown on a screen at a hearing held by the Select Committee to Investigate the January 6th Attack on the U.S. Capitol on June 09, 2022 on Capitol Hill in Washington, DC. The bipartisan committee, which has been gathering evidence related to the January 6, 2021 attack at the U.S. Capitol for almost a year, will present its findings in a series of televised hearings. On January 6, 2021, supporters of President Donald Trump attacked the U.S. Capitol Building in an attempt to disrupt a congressional vote to confirm the electoral college win for Joe Biden. (Photo by Jabin Botsford-Pool/Getty Images)

FEATURED IMAGE: WASHINGTON, DC - JUNE 09: A tweet from former President Donald Trump is shown on a screen at a hearing held by the Select Committee to Investigate the January 6th Attack on the U.S. Capitol on June 09, 2022 on Capitol Hill in Washington, DC. The bipartisan committee, which has been gathering evidence related to the January 6, 2021 attack at the U.S. Capitol for almost a year, will present its findings in a series of televised hearings. On January 6, 2021, supporters of President Donald Trump attacked the U.S. Capitol Building in an attempt to disrupt a congressional vote to confirm the electoral college win for Joe Biden. (Photo by Jabin Botsford-Pool/Getty Images)

Filed Under

2020 presidential election, Disinformation, Domestic Extremism, Donald Trump, Donald Trump Jr, elections, House Select Committee on January 6, January 6 Commission, January 6th Attack on US Capitol, Misinformation, Social Media Platforms, Trump Campaign, Twitter

Send A Letter To The Editor