Gorwa et al. (2020) – Algorithmic content moderation: technical and political challenges in the automation of platform governance Flashcards

1
Q

Abstract

A

As government pressure on major technology companies builds, both firms and legislators are
searching for technical solutions to difficult platform governance puzzles such as hate speech and misinformation. Automated hash-matching and predictive machine learning tools – what we
define here as algorithmic moderation systems – are increasingly being deployed to conduct
content moderation at scale by major platforms for user-generated content such as Facebook,
YouTube and Twitter. This article provides an accessible technical primer on how algorithmic
moderation works; examines some of the existing automated tools used by major platforms to
handle copyright infringement, terrorism and toxic speech; and identifies key political and ethical
issues for these systems as the reliance on them grows. Recent events suggest that algorithmic
moderation has become necessary to manage growing public expectations for increased platform
responsibility, safety and security on the global stage; however, as we demonstrate, these
systems remain opaque, unaccountable and poorly understood. Despite the potential promise of
algorithms or ‘AI’, we show that even ‘well optimized’ moderation systems could exacerbate,
rather than relieve, many existing problems with content policy as enacted by platforms for three
main reasons: automated moderation threatens to (a) further increase opacity, making a
famously non-transparent set of practices even more difficult to understand or audit, (b) further
complicate outstanding issues of fairness and justice in large-scale sociotechnical systems and
(c) re-obscure the fundamentally political nature of speech decisions being executed at scale.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Turning to AI for moderation at scale

A

Automated moderation systems have become necessary to manage growing public
expectations for increased platform responsibility, safety and security
- But, these systems remain opaque, unaccountable and poorly understood
* The goal of this article is to provide an accessible primer on how automated moderation
works

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is algorithmic moderation?

A
  • Content moderation à governance mechanisms that structure participation in a
    community to facilitate cooperation and prevent abuse
  • In this understanding, moderation includes not only the administrators or moderators
    with power to remove content or exclude users, but also the design decisions that
    organize how the members of a community engage with one another
  • Algorithmic commercial content moderation (algorithmic moderation) à a system that
    classify user-generated content based on either matching or prediction, leading to a
    decision and governance outcome (e.g., removal, geo-blocking, account takedown)
  • Hard moderation systems à systems that make decisions about content and accounts
  • The focus of this paper lies on the hard moderation systems
  • Soft moderation systems à recommender systems, norms, design decisions,
    architectures, etc.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

A primer on the main technologies involved in algorithmic moderation

A
  • Algorithmic content moderation involves a range of techniques from statistics and
    computer science, which vary in complexity and effectiveness
  • They all aim to identify, match, predict, or classify some piece of content (text, audio,
    image, video, etc.) on the basis of its exact properties or general features
  • There are some major differences in the techniques used depending on the kind of
    matching or classification required, and the types of data considered:
  • A distinction between systems that aim to match content à ‘is this file depicting the
    same image as that file?’
  • A distinction between systems that aim to classify or predict content as belonging to
    one of several categories à ‘is this file spam? Is this text hate speech?’
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Hashing:

A

The process of transforming a known example of a piece of content into a ‘hash’ – a string of data meant to uniquely identify the underlying content
- They are useful because they are easy to
compute and smaller in size than the underlying content, so it is easy to compare any given hash against a large table of existing hashes to see if it matches any of them

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q
  • Secure cryptographic has functions
A

à aim to create hashes that appear to be random,
giving away no clues about the content from which they are derived
- They are useful for checking the integrity of a piece of data or code to make sure that
no unauthorized modifications have been made
- For example, if a software vendor publishes a hash of the software’s installation file,
and the user downloads the software from somewhere where it may have been
modified, the user can check the integrity by computing the hash locally and
comparing it to the vendor’s
- Cryptographic hash functions are not useful for content moderation, because they are
sensitive to any changes in the underlying content, such that a minor modification
(changing the color of one pixel in an image) will result in a completely different hash
value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Perceptual hashing

A

Involves fingerprinting certain perceptually salient features of content, such as corners in images or hertz-frequency over time in audio

  • This type of hashing can be more robust to changes that are irrelevant to how
    humans perceive the content
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Classification

A

assesses newly uploaded content that has no corresponding previous version in a database

  • The aim is to put new content into one of a number of categories
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q
  • Modern classification too
A

Machine learning (the automatic induction of statistical patterns from data)

  • One of the main branches of machine learning is supervised learning: models are
    trained to predict outcomes based on labelled instances (offensive/not offensive)
  • Content classification à based on manually coded features

It is hard to identify the context of a text or word, when using this type of classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Bag of words

A

treats all of the words in a sentence as features, ignoring order and
grammar

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Word embeddings

A

Represent the position of a word in relation to all the other words that usually appear around it

  • Semantically similar words therefore have similar positions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Matching and classification have some important differences:

A
  • Matching requires a manual process of collating and curating individual examples of
    the content to be matched (particular terrorist images)
  • Classification involves inducing generalizations about features of many examples from a given category into which unknown examples may be classified (terrorist images in general)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

An algorithmic moderation typology

A
  • The specific fashion in which these matching or predictive systems are deployed depends
    greatly on a variety of factors, including:
  • The type of community
  • The type of content it must deal with
  • The expectations placed upon the platform by various governance stakeholders
  • Automated tools are used by platforms to police content across a host of issue areas at
    scale, including terrorism, graphic violence, toxic speech (hate speech, harassment and
    bullying), sexual content, child abuse, and spam or fake account detection
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Once content has been identified as a match, or is predicted to fall into a category of
content that violates a platform’s rules, there are several possible outcomes:

A
  1. Flagging: content is placed in either a regular queue, indistinguishable from user flagged content, or in a priority queue where it will be seen faster, or by specific ‘expert’
    moderators
  2. Deletion: content is removed outright or prevented from being uploaded in the first
    place
  • Fully automated decision-making systems that do not include a human-in-the-loop are
    dangerous
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Copyright

A

Content ID: is unique in that it allows copyright holders to upload the material that will be (a) searched against existing content on YouTube and (b) added to a hash database
and used to detect new uploads of that content

  • In the copyright context, the goal of deploying automatic systems is not only to find identical files but also to identify different instances and performances of cultural
    works that may be protected by copyright
  • A key concern in the deployment of automated moderation technologies in the context of
    copyright is systematic over-blocking
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Terrorism

A

Terrorism
* EU Code of Conduct on Countering Illegal Hate Speech Online: committing the firms to
a wide-ranging set of principles, including:

  • Takedown of hateful speech within 24 hours under platform terms of service
  • Intensification of cooperation between themselves and other platforms and social
    media companies to enhance best practice sharing
  • Each firm applies its own policies and definitions of terrorist content when deciding
    whether to remove content when a match to a shared hash is found
17
Q

Toxic speech

A
  • Toxicity of comments / conversational health : umbrella terms for various concepts,
    including hate speech, offence, profanity, personal attacks, sleights, defamatory claims,
    bullying and harassment
  • By training machine learning algorithms on large corpora of texts manually labelled for
    toxicity, they aim to create automatic classification systems to flag ‘toxic’ comments
  • Perspective à an application programming interface (API) with a stated aim to make it
    easier to host better conversations
  • A platform could use Perspective to receive a score which predicts the impact a comment might have on a conversation
  • The clearest problem of moderation of toxic speech is that language is incredibly
    complicated, personal and context dependent words that are widely accepted to be
    slurs may even be used by members of a group to reclaim certain terms
  • Insufficient context awareness can lead crude classifiers to flag content for
    adjudication by moderators who usually do not have the context required to tell
    whether the speaker is a member of the group that the ‘hate speech’ is being directed
    against
18
Q

Three political issues: transparency, fairness and depoliticization

A
  • There is outsized concern about over-blocking à it is very difficult for predictive classifiers
    to make difficult, contextual decisions on slippery concepts like hate speech, and
    automated systems a scale are likely to make lots of incorrect decisions on a daily basis
  • The use of automate techniques can potentially help firms remove illegal content more
    quickly and effectively
19
Q

Decisional transparency

A

A common critique of automated decision making is the potential lack of transparency,
especially when claims of commercial intellectual property are used to deflect
responsibility

  • In content moderation it will become more difficult to decipher the dynamics of takedowns (and potential human rights harms) around some policy issues when the initial flagging decisions were made by automated systems
  • From a user perspective, there is little transparency around whether (or to what extent) an
    automated decision factored into a takedown
20
Q

Justice

A
  • Content classifiers in general, whether used for recommendation, ranking, or blocking,
    may be more or less favorable to content associated with gender, race, and other
    protected categories
  • Hate speech classifiers designed to detect violations of a platform’s guidelines could be
    disproportionally flagging language used by a certain social group, thus making that
    group’s expression more likely to be removed.
  • Fairness critiques often miss broader structural issues, and risk being blind ot wider
    patterns of systemic harm
21
Q

De-politicization

A

Algorithmic moderation has already introduced a level of obscurity and complexity into the
inner workings of content decisions made around issues of economic or political
importance, such as copyright and terrorism
* This elides the political question of who exactly is considered a terrorist group
* As algorithmic moderation becomes more seamlessly integrated into user’s day-to-day
online experience, human rights advocates and researchers must continue to challenge
both the discourse and reality of the use of automated decision-making in moderation