The discussion of python film project: The building of knowledge co-creation model through crowdsourcing

This summative essay presents argument in the implications of crowdsourcing, data, and coding to discuss the nature of knowledge and the nature of building knowledge model based on study of digital humanities.


About the Project

Check python film project for details.


Essay

What is the nature of the building of knowledge co-creation model through crowdsourcing of film reviews? In what ways do coding help or hinder to build the model?

As a traditional form of medium, films work on delivering visual information through moving pictures on the screen. The rise of the network industry and the acceleration of the information revolution is changing the traditional pattern of the film industry. The ‘big six’ (20th Century Fox, Warner Bros., Paramount Pictures, Columbia Pictures, Universal Pictures and Walt Disney Pictures) is gradually substituted by steaming media companies, such as Netflix (Institute and Posts, 2019). On the other hand, the film industry is gradually penetrated by co-creation through crowdsourcing, as the film reviews and ratings from the crowd online has been accepted as important evaluation to be considered by the film industry. Web 2.0, a combination of changes in ways that websites are produced and used, has brought the digital users on stage to generate digital co-creation content and knowledge. The examples are taken to online database and review-aggregation websites for film and television, such as IMDb and Rotten Tomatoes. Based on my digital humanities python project which I scraped reviews data of the film called The Assassin (2015) on Rotten Tomatoes, building visuals to investigate and analyse the audience’s perspectives, preferences, and understandings of story and mise-en-scene of the film, this summative essay will present argument in the implications of crowdsourcing, data, and coding to discuss the nature of knowledge and the nature of building knowledge model based on study of digital humanities.

Crowdsourcing is a new way of interacting with knowledge. The core concept for crowdsourcing based on the wisdom of large groups of people (Garcia Martinez and Walton, 2019). The web 2.0 facilitates networking between users and builds interaction between technologies and knowledge creation. The digital users are allowed to get access to any digital platform anywhere, therefore create distributed knowledge (Burdick, 2012). Because of the development of high-speed networks, knowledge creation is becoming saturated and the production, access, and dissemination are becoming ‘ever-more distribution processes’ (Burdick, 2012, p,50). Crowdsourcing create new possibilities for people who gain basic technology skills to get access to create, share and gain knowledge. It means the users are able to generate their own knowledge and collaborate with others to build, edit and manage content altogether. In field of digital humanities, this model allows a reconstruction of humanities with digital technologies, forming new interpretation under contemporary social framework, linking with contemporary disciplines, bringing new life to deal with contemporary issues. For example to work with python, the film reviews as co-creation through crowdsourcing can be reinterpreted, organised and presented by visuals to support academic research or to be used by society at large.

On the other hand, crowdsourcing challenges the control power of knowledge. Traditionally, knowledge and access of knowledge are controlled by institutions and authorities. Burdick (2012, p.91) points out that the freedom of exploring otherness followed by ‘truly social and participatory forms of cultural creation’ is the very vital sense of the decolonisation of the knowledge. As Francis Bacon says, knowledge is power (Ajana, 2017). The decolonisation of knowledge discrete the power which could indicate a platform controlled by dominant individuals and industries, or embedded ideologies of digital environment. It challenges ‘the normative assumptions that encode ideological assumptions in operational features’ (Burdick, 2012, p.91). In some extents, this openness could be motivations for users to share their perspectives and knowledge online. Because they are able to express themselves freely without being afraid of authorities, or being tied up by specific ideologies. On the other hand, comments from the other users or exchange of knowledge and information could be incentives as well.

Nevertheless, ‘the wisdom of the crowd’ does not mean the higher quality of knowledge. Keen argues that ‘what the Web 2.0 revolution is really delivering is superficial observations of the world around us rather than deep analysis, shrill opinion rather than considered judgment’ (2007). One of the reasons could be, the fast-developed technology bring up an explosion of information. The open access offer the public a cheap and easy way generate and deliver messages and information. Anyone who can use internet, is able to generate any message online without paying much costs. There are a lot of valueless messages are generated repeatedly, which even make the information becomes cheaper. Hence, the co-creation through crowdsourcing could be able to lack values and become a sort of information garbage. Processing data needs time. The more data need to be cleaned, the more time is consumed. Finally, data is commercialised for marketing use. Social media platforms, such as Twitter, require a large amount of user data in order to analyse the users and their behaviours to draw user patterns. Businesses need analysed data to conclude market performance and predict preferences of customers, in order to make marketing strategies, reduce costs and increase benefits. The streaming media company Netflix is an example. When Netflix first entered film and television production, it was considered as a spoiler. But then the reason that Netflix goes beyond Disney, HBO and Hulu is because of a complete system of technology. In 2012, Netflix launched a project to motivate engineers to continuously update recommendation algorithms to improve user experience. At the same time, Netflix applied big data analysis to produce and optimism original content in order to find the easiest way to attract users by the content. The aim is to let users get highly personalised content recommendation anytime.

When code makers do data mining, it is necessary to think of ‘what constitutes the data’ and ‘the ways in which these data are structured’ (Burdrick, 2012, p. 42). Primarily, it is necessary to understand background of sources of information. There are two widely used review-aggregation websites for film and television, which are IMDb and Rotten Tomatoes. On IMDb, only the subscribers are able to rate films, which means that the rating system targets its particular group. Besides, Rating could not always make sense without reviews. IMDb users left very limited number of reviews. While on Rotten Tomatoes, people can see debates about the film, which makes rating not just a result but actually based on some opinions from raters. Rotten Tomatoes divides ratings system into ratings from critics and ratings from audience. What it tries to do is to organise both reviews and ratings from both critics and audience to make ratings and reviews make sense as much as possible. IMDb is not a representative of worldwide, instead, it is very likely the U.S. dominated because the rating demographic simply divides raters from U.S. and outside the U.S. (IMDb, 2019). As Hollywood is the world’s centre of film industry, IMDb is more likely represent a mainstream ideologies and values of American and world film industry. Therefore, if the project based on mining the data from IMDb, then the data output is embedded the mainstream ideologies and values as well.

From the screen to the output data, the whole information transmission processes refer to two forms of messages: information as signals and information as data. Each information transmission is a reinterpretation based on different interpretive context and purposes. The initial visual information has been encoded and decoded by film audience and the process based on encoding/decoding model. Hall defines encoding/decoding model as a process which the media text is produced and then interpreted (Bounds, 1993).The codes are not absolutely symmetrical due to the understanding and misunderstanding by both ‘encoder-producers and decoder-producers’ (Hall, 1993, p.510). The information which the film presents, refers to mise-en-scene which is defined as the consists of all the elements that a viewer can see on the screen, such as colour, lighting, settings, props, composition and so on (Sikov, 2011). Mise-en-scene delivers visual information which is produced by the film’s narrative (Kuhn and Westwell, 2012). These visual messages are encoded by the director and cinematographers, being transmitted to the audience. Each audience has unique way to decode the messages based on personal social and cultural experience. When they write reviews on review-aggregation websites for the film, they need to transfer the received messages from visual format to textual format to finish encoding.

Then the transmission process goes to the information as data. When a code maker mines reviews and put them into any programming language such as python, Natural Language Processing (NLP) which is the interactions between computers and human languages, is applied to process the input text and transfers it into data. The final results are printed as visuals to be presented to any other viewers. This transmission process involves an issue of data mining which concerns if humanities knowledge can be functioned by programming language and quantified, as programming language is completely different with human natural language.

The first issues is, in what extent, programming language is able to recognise social and cultural variables in natural language. The output of programming language is data. Data does not have social and cultural meaning itself, but is given by contexts. For example, when data is put into a framework of cultural analytics project, it may represent visual interfaces such as infographics; when data is put into digital mapping to show any changes of a historical place during a period of time, it may contain both geographic and historical implications; when data is put into corpus analysis of historical literature, it may have implications of history and linguistics. Apart from study of digital humanities, data reflects biological meaning when it used to be quantified the body and self. On social media platforms, a very large amount data of the users are collected to be analysed to support commercial behaviours as products, or to be used to predict political campaigns or elections. All in all, everything which could be filled with data, is able to give specific implications to data. As what Lupton said, ‘digital data are culturally represented as liquid entities that require management and containment’ (Lupton, 2016, P.88).

Further, watching a film involves more complicated communication which engages multiple senses included hearing and vision. Therefore, another question is risen: can senses being interpreted and quantified? For audience, watching a film is more likely to be a one-way communication as they receive messages from the screen and trying to decode by themselves. When they sit at a cinema in darkness, their senses are refined and unconsciously enhanced, which allows the audience to capture more detailed signals which are transmitted by the film from the screen. During the watching process, all the senses are working together to deal with understanding and interpreting the messages. The reproduction of these messages is subjectively. Programming language is able to use TextBlob to define sentiment of reviews by polarity which is ranged between -1(negative statement) and +1(positive statement) (TextBlob, 2019). Sentiments of any single lexical item and any simple sentence are able to be well-defined. However, it could be hard for TextBlob to define satires due to complexity of contexts. Therefore, if a piece of review is complicated enough, then the sentiment analysis is likely not to be able to work precisely. Moreover, the polarity works on the scale, which means if there is a lack of lexical items to be compared with each other on the scale, hence, polarity of each lexical item will not be precisely measured. However, people could not even define and quantify every single kind of sentiment, then how can they work to quantify sentiment on a scale?

Yet we are still not sure about in what extent and what other ways the technologies can collaborate with humanities to form new interpretation and creation of knowledge for academic and social use. Burdick argues that study humanities is not based in ‘calculation, automation, or statistical probability’ (Burdick, 2012, p.92). Humanities are subjective and observer-dependent, has contingency and ambiguity, which allow knowledge to be formed both ‘ontologically and socially’ (Burdick, 2012, p.92). It is necessary for us to keep thinking of in what extent technologies and humanities exert influence scientifically, socially and culturally. The technologies should help to work with humanities to create new models of studying humanities in contemporary society and the future.


Filmography

The Assassin (2015)


Bibliography

Ajana, B. (2017). Digital health and the biopolitics of the Quantified Self. DIGITAL HEALTH, 3, p.205520761668950.

Bounds, p. (1993). Cultural Studies: A Student’s Guide to Culture, Politics and Society, Plymouth: Studymates

Burdick, A. (2012). Digital humanities. Cambridge, Mass.: MIT Press.

Garcia Martinez, M. and Walton, B. (2014). The wisdom of crowds: The potential of online communities as a tool for data analysis. Technovation, 34(4), pp.203-214.

Hall, S. (1993). ‘Encoding, decoding’, in Smith, S.D. (ed.) The Cultural Studies Reader. London; New York: Routledge.

IMDb. (2019). Ratings and Reviews for New Movies and TV Shows - IMDb. [online] Available at: https://www.imdb.com/?ref_=nv_home [Accessed 30 Apr. 2019].

Institute, M. and Posts, R. (2019). Big Six | MAAC India Academy Animation & VFX Industry Blog – MAAC India Institute. [online] Maacindia.com. Available at: https://www.maacindia.com/blog/big-six [Accessed 29 Apr. 2019].

Lupton, D. (2016) The quantified self : a sociology of self-tracking**. Cambridge, UK; Malden, MA, USA : Polity

Keen, A.,(2007) The Cult of the Amateur, London : Nicholas Brealey

Kuhn, A. and Westwell, G. (2012). A dictionary of film studies. Oxford: Oxford University Press.

Sikov, E. (2011). Film studies. New York: Columbia University Press.

TextBlob, N. (2019). Natural Language Processing for Beginners: Using TextBlob**. [online] Available at: https://www.analyticsvidhya.com/blog/2018/02/natural-language-processing-for-beginners-using-textblob/ [Accessed 30 Apr. 2019].