(Editors’ note: This article is co-published with Tech Policy Press.)

On Monday, June 13, the House Select Committee on the January 6 Attack on the U.S. Capitol hosted the second of its planned series of public hearings, focused on the Big Lie that the election was stolen from former President Donald Trump. The Chairman of the Committee, Rep. Bennie Thompson (D-MS), said in his opening statement on Monday that the Committee’s investigation has established that Trump “betrayed the trust of the American people. He ignored the will of the voters. He lied to his supporters and the country. And he tried to remain in office after the people had voted him out—and the courts upheld the will of the people.”

On the same day the Committee laid out evidence that Trump and his associates knew the election was lost even as they cynically pushed the Big Lie, a group of researchers from the University of Washington’s Center for an Informed Public and the Krebs Stamos Group* published a massive dataset of “misinformation, disinformation, and rumors spreading on Twitter about the 2020 U.S. election.” The dataset chronicles the role of key political elites, influencers and supporters of the President in advancing the Big Lie, exploring how key narratives spread on Twitter.

Published in the Journal of Quantitative Description, the paper accompanying the dataset is titled Repeat Spreaders and Election Delegitimization: A Comprehensive Dataset of Misinformation Tweets from the 2020 U.S. Election. The dataset, which the researchers named ElectionMisinfo2020, “is made up of over 49 million tweets connected to 456 distinct misinformation stories spread about the 2020 U.S. election between September 1, 2020 and December 15, 2020,” and it “focuses on false, misleading, exaggerated, or unsubstantiated claims or narratives related to voting, vote counting, and other election procedures.”

“President Trump and other pro-Trump elites in media and politics set an expectation of voter fraud and then eagerly amplified any and every claim about election issues, often with voter fraud framing,” said one of the lead researchers, Dr. Kate Starbird, an associate professor at the University of Washington and co-founder of the Center for an Informed Public. “But everyday people produced many of those claims.” The new report sheds light on how these claims proliferated from the margins to the nation’s Capitol.

The Nature of False Claims about the 2020 Election

The researchers collected 307 distinct “stories” or narratives, encompassing 44.8 million tweets, that they labeled as ‘sowing doubt’ in the 2020 election.** After a painstaking project to annotate and group the stories, the researchers were able to see patterns in “content groups,” or “collections of stories that have similar narrative or thematic components” representing the broadest categories of stories.

Notably, while “four out of the top five content groups were primarily spread by Trump-supporting accounts,” the researchers conclude, a set of allegations about possible U.S. Postal Service involvement in election fraud was primarily advanced by Biden supporters. But the overall proportion of misinformation was “highly skewed toward pro-Trump accounts” throughout the period.

A table showing the five most common "content groups" in the dataset. All but the third column are a light red. Column 1 is headed "TECHNOLOGY"; the body reads: "The largest group includes a range of allegations, such as “stories alleging that votes were changed digitally, about hacking allegations, and about the false claims concerning Dominion voting systems.”" Column 2 is headed: "BALLOT HARVESTING" and reads "The second largest group of false claims concern allegations of "Ballot Harvesting", including “that operatives targeted and misled elderly voters” to collect mail-in ballots." Column 3 is highlighted in blue and headed "USPS ACCUSATIONS"; it reads: "The third largest group “focused on false accusations that the U.S. Postal Service (including its leadership and/or employees) was improperly interfering with the election and was mostly spread by Biden-supporting accounts.”" Column 4 is headed "PARTISAN VOTE COUNTING/RECORDING" and reads: "The fourth largest group concerned allegations “surrounding how votes were counted, labeled ‘Partisan Vote Counting/Recording.’ Many of those stories were allegations about improper handling of ballots, or delays in counting timed to benefit one candidate.” Column 5 is headed "VOTES CAST BY DEAD PEOPLE" and reads: "The fifth largest content group “was made up of stories which alleged that votes had been cast in the name of dead people.”"

The largest story in the dataset revolved around Dominion voting systems and claims its software “had systematically changed votes from candidate Trump to candidate Biden.” Spawned from reports of a single “mistake in the process of updating the software on vote tabulation computers” in Antrim County, Michigan, which was quickly corrected, the story was adopted by Donald Trump Jr., and then by President Trump, who tweeted about “dominion” 24 times between Nov. 6 and Dec. 15, 2020. “Dominion,” concludes the researchers, “was a prime example of an isolated incident which was reframed to suit the narrative that election fraud was systematic and widespread.”

Another prominent story is what came to be known as “Sharpiegate,” an example of the set of allegations in the ”partisan vote counting/recording” content group. Sharpiegate started “when voters in several polling locations,” mostly in Arizona, “noted that the Sharpie pens they had been given to vote were bleeding through the ballots—and some began to share concern (and later suspicion) that their votes had not been counted.” While the claims were wrong, Sharpiegate concerns, shared first by social media accounts with limited reach but soon spreading to larger accounts, exploded after Arizona was called for Biden. Later, claims on the account @CodeMonkeyZ, run by prominent QAnon figure Ron Watkins, pushed Sharpiegate further.

“SharpieGate”refers to false claims that Trump voters specifically were disenfranchised by being forced to vote with Sharpie pens that purportedly invalidated ballots.

Repeat Spreaders

The analysis was also able to “identify high profile accounts who had highly-retweeted tweets in several distinct misinformation stories—specifically among stories that functioned to sow doubt in election procedures or election results.” These “repeat spreaders” had a disproportionate impact on the total misinformation spread across the entire Twitter platform during the 2020 election, as measured by their spread of “multiple, distinct misinformation stories.” The top 35 “repeat spreaders” include far-right media entities and influencers, QAnon personalities, campaign advisors such as Rudy Giuliani, and the former president and his sons.

Rank User Screen Name Verified User Stories With Large Tweet (>1000 RTs) Large Tweets (>1000 RTs) Number of Retweets Stories With Any Tweet or Retweet
1 RealJamesWoods Yes 24 30 363,349 29
2 gatewaypundit Yes 21 85 408,586 38
3 TomFitton Yes 19 28 140,259 25
4 JackPosobiec Yes 18 42 165,274 35
5 EricTrump Yes 17 28 463,353 26
6 realDonaldTrump Yes 16 55 2,286,540 22
7 DonaldJTrumpJr Yes 16 24 357,766 45
8 catturd2 No 15 22 75,290 24
9 prayingmedic No 14 45 118,844 28
10 JamesOKeefeIII No 13 54 452,749 15
11 ChuckCallesto Yes 13 37 295,710 21
12 MichaelCoudrey Yes 13 28 184,850 32
13 ANONYMIZED No 12 33 71,300 16
14 robbystarbuck Yes 11 17 78,707 44
15 stillgray Yes 11 18 75,688 40
16 RichardGrenell Yes 10 25 289,835 16
17 RealCandaceO Yes 10 9 248,614 10
18 michellemalkin Yes 10 28 87,237 18
19 scrowder Yes 10 17 67,322 12
20 pnjaban Yes 10 11 46,164 28
21 charliekirk11 Yes 9 28 394,231 12
22 RyanAFournier Yes 9 10 107,962 32
23 PhillyGOP No 9 9 36,650 17
24 joshdcaplan Yes 9 9 30,696 18
25 johncardillo Yes 9 9 24,726 39
26 RudyGiuliani Yes 8 14 264,090 8
27 Project_Veritas Yes 8 26 119,348 12
28 ScottAdamsSays Yes 8 18 110,475 15
29 jsolomonReports No 8 25 97,756 10
30 marklevinshow Yes 8 16 96,395 8
31 seanmdav Yes 8 10 67,669 42
32 Timcast Yes 8 11 65,480 10
33 mschlapp Yes 8 12 56,613 21
34 BreitbartNews Yes 8 17 45,945 14
35 DiamondandSilk Yes 8 11 44,071 14

Possible Interventions

The researchers conclude with observations on potential interventions platforms might consider, including addressing “repeat spreaders.” Since incentives on Twitter are “tied to follower interactions, lesser interventions, such as content labeling, are not likely to have a significant impact on the willingness of these accounts to interact with questionable content.” The researchers suggest that a more “fruitful” approach may be to enforce rules “more stringently” on repeat offenders.

This could include the implementation of “strike systems” presently in use on Twitter and YouTube, which impose escalating penalties on accounts for each rule violation to discourage repeat offenses, or a combination of approaches. So far, however, these strike systems appear not to have been sufficient to reduce the reach of repeat spreaders of misinformation: of the 35 repeat spreaders identified in the table above, “most continue to post on Twitter to a wide audience.” While seven of the accounts were suspended after the election, only two were removed for violations of Twitter’s policy on disputed election claims. (Notably, a number of Republican elites have continued to use Twitter to sow doubt in the 2020 election, with no apparent consequences.) More consistency in application, increased penalties, or improvements in social media companies’ capacities to limit the amplification of misinformation may be necessary to avoid repetition of these patterns in upcoming election cycles.

The Path to the Capitol

While the dataset only includes tweets through Dec.15, 2020 – notably before Trump’s Dec. 19 tweet announcing a “[b]ig protest in D.C. on January 6th” that kicked off a frenzy among his supporters – it does collect stories that relate to political violence. The researchers chronicled “118 unique stories in the broader ElectionMisinfo2020 dataset related to violence or threats of violence, split between the content groups intimidation (38), suppression (21), riots (18), discussions of a potential coup (16), protests (16), and discussions of civil war (9),” and posit that more work is necessary to understand the relationship between misinformation about elections and political violence.

On Capitol Hill, the January 6 Select Committee hearings have already made clear that Donald Trump, who not only generated his own false claims about the election but also embraced, endorsed, and amplified the false claims made by his supporters, played a key role in the incitement of the attack on Congress. “Mr. Chairman,” Select Committee Vice Chairwoman Liz Cheney, R-WY, stated in Monday’s hearing, “hundreds of our countrymen have faced criminal charges, many are serving criminal sentences because they believed what Donald Trump said about the election and they acted on it.”

Researcher Kate Starbird notes that Trump and his loyalists were very effective in creating a participatory mechanism designed to manufacture and reinforce false claims to sow doubt in the outcome of the election.

“One thing we should remember is that President Trump and his campaign were not only repeatedly sharing false claims that the election was or would be rigged,” said Starbird, “but they also encouraged his supporters to share evidence of voting issues — through the ’Army for Trump’ and ‘Defend Your Ballot’ initiatives. The Trump campaign provided the structure for participation in the ‘voter fraud’ disinformation campaign. In other words, they not only encouraged people to participate in supporting Trump, but they gave people a mechanism through which to participate — through sharing claims about voting issues and potential fraud.”

Looking Forward

While the January 6 Select Committee is due to complete its work later this year, the ElectionMisinfo2020 dataset will likely serve as a substantial building block for years of future research on phenomena at the intersection of social media, politics, and democracy.

A key question social media researchers seek to address is how to determine the role false claims and disinformation play in political violence. Starbird says while we all saw “hashtag-warriors come to life on the Capitol grounds and then swarming within the Capitol building,” it can be difficult to discern what sparked violence, or to disambiguate “organic” versus “coordinated” behavior. The Proud Boys who instigated the first confrontation with Capitol Police and used force to breach the building certainly bear substantial responsibility for the violence that day, but what encouragement did they feel knowing they had the crowd at their backs — both on the day and in the online spaces where they planned the attack?

The ElectionMisinfo2020 dataset of tweets, massive as it is, may also eventually be federated with other datasets drawn from other social media platforms during the 2020 election cycle. The complete contents of the social media site Parler were scraped by researchers, for instance, through January 2021, and some academic researchers had substantial access to Facebook to conduct research during the 2020 cycle. “I do think it would be very valuable to put some of the datasets together, but it can be methodologically challenging to combine data streams,” said Starbird. Such combinations could produce new insights.

Certainly, understanding how claims that delegitmize elections spread across platforms is urgent business, both for the Select Committee and for the platforms themselves. As the researchers note, according to one poll, “31% of the U.S. population still believe the ‘big lie’ that the election was stolen from then-President Trump.” And far right Republican candidates that have spread falsehoods about the 2020 election are advancing in campaigns for pivotal positions of power related to election administration. With another election cycle looming, it is crucial that insights on how to confront the Big Lie inform decisions taken by firms such as Twitter, Facebook, and YouTube and the recommendations the Select Committee makes to Congress, lest the next mob is too large to resist.


*Alex Stamos, the former Facebook Chief Security Officer who heads the Stanford Internet Observatory, started the Krebs Stamos Group with Christopher Krebs, the former director of the Department of Homeland Security’s (DHS) Cybersecurity and Infrastructure Security Agency (CISA) who was fired by Trump on November 17, 2020 for asserting that the 2020 election was “the most secure in American history.”

**The paper and data set built on the work of the Election Integrity Partnership (EIP), a collaboration of the University of Washington Center for an Informed Public; the Stanford Internet Observatory; the Atlantic Council’s digital forensics research unit, DFRLab; and Graphika, a firm that tracks disinformation. EIP monitored election misinformation in real time ahead of the 2020 election, identifying and tracking the spread of false claims.

Image: WASHINGTON, DC – JUNE 09: A tweet from former President Donald Trump is shown on a screen at a hearing held by the Select Committee to Investigate the January 6th Attack on the U.S. Capitol on June 09, 2022 on Capitol Hill in Washington, DC. The bipartisan committee, which has been gathering evidence related to the January 6, 2021 attack at the U.S. Capitol for almost a year, will present its findings in a series of televised hearings. On January 6, 2021, supporters of President Donald Trump attacked the U.S. Capitol Building in an attempt to disrupt a congressional vote to confirm the electoral college win for Joe Biden. (Photo by Jabin Botsford-Pool/Getty Images)