Getting Started With Media Cloud

Before You Start…

Log in to Media Cloud in order to access media collections and analytical tools:

https://dashboard.mediacloud.org/

For now, if you do not have a Media Cloud account, you can register here with your @globalvoices.org email address. Once the NewsFrames platform has been built, users will automatically have access to MediaCloud.

Basic Media Cloud Research Ideas

The following ideas will help you as you experiment with Media Cloud:

  • Think about what questions do you would like to answer
  • Figure out how much and what kind of information is related and available
  • Run comparisons between collections
  • Run comparisons between keywords
  • Write up your insights from your initial research

Querying

Media Cloud allows users to conduct simple queries and comparisons:

Each query has three elements:

  1. Keywords
  2. The Media Collection that will be searched
  3. A search timeframe

1. Keyword Search

Several words can be included in the keyword field:

KeywordA AND KeywordB AND KeywordC

Omitting Keywords

It is also possible to request that a word be omitted from the results:

KeywordA NOT KeywordB

Wildcards

Wildcards can also be used:

immigrant*

OR

*migrant*

2. Searching Article Collections and Sources

Media Cloud has several collections of sources. Examples include U.S. Digital Media based on a list informed by Pew Research, and a snapshot list of sources used by Global Voices in 2013.

Collections are continually being updated: NewsFrames created a collection of sources for Pakistan research and is updating collections for Ecuador, Venezuela and other countries.

In a Media Cloud query you can either combine several collections, or use a single collection. You may also query a single media source from the complete list of sources.

A note on source quality. It's important to note that sources and collections can be limited in scope and sometimes completeness; Media Cloud attempts to scrape as much as possible of publicly available content on the web, but this does not include Twitter feeds or Facebook page posts for example. So make sure to dig into the Media Cloud collections and sources that you want to use, and figure out their health (whether or not they are actively being collected) or what the assumptions behind collections like “Global Voices in 2013″ before using them.

3. Set a Timeframe for Your Search

Media Cloud allows you to search different time periods: e.g., last week, last month, last year; or use custom start and end dates.

Combining Media Cloud Queries

Media Cloud allows you to combine several queries in a single search. In this way, it is possible to run queries with the same keyword on different collections.

You can also run queries with related (or opposite) keywords on the same collection.

Searching and exploring data to gain insights

Media Cloud searches can generate three potential outputs:

  1. Pulse (Searching a Timeline)
  2. Word Cloud (Frequency)
  3. Mentions Matching (Articles)

1. Pulse (Searching a Timeline)

In the timeline a keyword frequency curve can be generated over a selected period of time, in the collection of sources in which the search was performed.

The graph allows you to easily observe the points in time where keyword mentions peaked, and declined, and the speed with which they did so.

The graph also lets you visualize the comparison between keywords in two or more media collections, or comparisons between two or more keywords in the same media collection.

2. Word Cloud (Searching by Frequency)

Searching by frequency also generates a word cloud where frequency of mention is indicated by the size of the text. Word clouds are built on a random sample of 1,000 sentences which Media Cloud has found to be representative of the entire set of sentences.

A search with just one query yields a standard word cloud.Clicking on any word in the cloud narrows your query further.

A search with two queries yields a visualization that helps you find the most important words in the “main,” “comparison,” and “combined” queries:

  • The left and right columns show up to 100 of the most frequently used words from the main and comparison queries (minus ‘stop words’ such as, in English, ‘and’, ‘the’, or ‘but’). See Media Cloud for additional information on stop words.
  • Words the two queries have in common show up in the center:

3. Articles (Mentions Matching)

Mentions matching shows you a random sample of sentences within stories that match your query. It also reports the number of sentences that match the query.

Media Cloud will not show you all the sentences in this output, but you can download the full list of stories with the keywords in a .csv file.

By clicking on “view article”, you can read the complete news story sampled for the query, as long as the original story has not been deleted or moved.

Saving Media Cloud Searches

You can use the “Save Search” button to keep track of the searches you perform.

This is very helpful when you are working on an investigation for an in-depth report (and want to return to the query) or are working in a collaborative project.

To return to a saved search, simply use the Load Saved Search option and continue the analysis with the same results:

NOTE: With saved searches, the parameters or list of stories shouldn't change but the word cloud might — the query is dynamic, and so re-performed each time. This is why it's good to download the info when you need it.

See “Download data as .csv, .png and .svg” for more information.

Examine Different Time-Frames Using Pulse

Pulse show you how many sentences match your query each day, week, or month. This supports a few ways to dig into the data:

  • Highlight an area by clicking and dragging on the chart to re-run your search, limited to that span of time:

Click and point on the graph to see a word cloud for that query during that day, week, or month:

Look Deeper Into Word Clouds

You can click on any word to refine the query (it will run a new search with the stem of the word you clicked on).

Click a query label at the bottom of the chart in Pulse to hide it. That can sometimes make it easier to select the exact query and word cloud you wish to explore. For example, “Shuar US” results have been hidden here:

Keep in mind that each column (main keyword query, keyword queries combined, comparison keyword) is normalized to focus on the estimated total number of unique words in the independent query.

Or to put it another way, columns are not helpful for figuring out how large one query is versus another, or how skewed a word is towards one query of the other.

Download Data As .csv, .png and .svg

Media Cloud users can also download the data to dig into more detail:

  • The frequency data of mentions observed in the output Pulse can be downloaded to a .csv file:

The Timeline graphic can also be downloaded in .png format and in a .svg file (vector image):

The data for the Word Cloud can be also downloaded as a .csv file as well as a .svg file:

You can download the list of stories that have one or more sentences matching each query, as a .csv file: