diff --git a/manuscript.md b/manuscript.md index 6bf7446433444e8f0614172572b5f89566387147..b3df2cc0f13687a8d49fd7437d061f2acda4ed48 100644 --- a/manuscript.md +++ b/manuscript.md @@ -26,16 +26,14 @@ In the age of growing science communication, this tendency for scientists to use To address this, we created a tool to analyze complexity of a given scientist’s work relative to other writing sources. The tool first quantifies existing text repositories with varying complexity, and subsequently uses this output as a reference to contextualize the readability of user-selected written work. -While other readability tools currently exist to report the complexity of a single document, this tool uses a more data-driven approach to provide authors with insights into the readability of their published work with regard to other text repositories. This will enable them to monitor the complexity of their writing with regard to other available text types, and lead to the creation of more accessible online material. We hope it will help scientists interested in science communication to make their published work more accessible to a broad audience, and lead to an improved global communication and understanding of complex topics. +While other readability tools currently exist to report the complexity of a single document, this tool uses a more data-driven approach to provide authors with insights into the readability of their published work with regard to other text repositories. This will enable monitoring of the complexity of their writing, guiding readability improvements to their online material. We hope it will help scientists interested in science communication make their published work more accessible to a broad audience, and lead to an improved global communication and understanding of complex topics. ## Methods ### Text Analysis Metrics We built a web-scraping and text analysis infrastructure by extending many existing Free and Open Source (FOS) tools, including Google Scrape, Beautiful Soup, and Selenium. -The Flesch-Kincaid readability score [@Kincaid:1975] is the most commonly used metric to assess readability, and was used here to quantify the complexity of each text item. - -Before analysis of the user input, we query and analyze a number of available text repositories with varying complexity. The Flesch-Kincaid readability score was calculated for each time in the repository. +Before analysis of the user input, we query a number of available text repositories with varying complexity: | Text Source | Mean Complexity | Description | |----------|----------|:-------------:| @@ -44,7 +42,9 @@ Before analysis of the user input, we query and analyze a number of available te | Post-Modern Essay Generator (PMEG) | 16.5 | generates output consisting of sentences that obey the rules of written English, but without restraints on the semantic conceptual references | | Art Corpus | 18.68 | library of scientific papers published in The Royal Society of Chemistry | -The author's name entered by the user is queried through Google Scholar, returning the results from articles containing the author's name. The Flesch-Kincaid readability score is then calculated for each of these articles. +The author's name entered by the user is then queried through Google Scholar, returning the results from articles containing the author's name. + +For all items, the Flesch-Kincaid readability score [@Kincaid:1975] - the most commonly used metric to assess readability - is used to quantify the complexity of all items. ### Plot Information The entered author name generates a histogram binned by readability score, which is initially populated exclusively by the ART corpus [@Soldatova:2007] data. We use this data because it is a pre-established library of scientific papers. The resulting graph displays the mean writing complexity of the entered author against a distribution of ART corpus content.