NOTE: This post has been published and archived on The Winnower. Please link to and cite that version instead of this one:
Juan Pablo Alperin, Alessandra Bordini, Sophie Pouyanne, PLOS, Please publish our articles on Wednesdays: A look at altmetrics by day of publication, The Winnower 2:e142972.29198 (2015). DOI: 10.15200/winn.142972.29198
One of the most fun parts of doing quantitative research is the exploratory analysis that often precedes a more rigorous and focused attempt at answering a research question. At the early stages of a research project, plotting different variables with only a vague notion of a question in mind can help determine what the data “look like.” At this stage in the process, all explorations are equally valid. Unexpected relationships or patterns are uncovered without having to worry about statistical models or significance of relationships, or whether the uncovered pattern answers a “research question” (whatever that means!).
One need not have an advanced knowledge of mathematics and statistics in order to look at and learn from data. Of course, they are useful to analyze and understand the data more fully, but the ability and knowledge required to extract, manipulate, and interpret data, indeed, can be developed by anyone with enough intellectual curiosity and desire to challenge their theoretical or heuristic assumptions. Metrics and measurement are a powerful strategic tool for understanding the world around us, and every student – whether a business major, a publishing graduate, or a future software engineer – should have an opportunity to familiarize themselves and experiment with it.
This is why metrics & measurement feature in the seminar course Technology and Evolving Forms of Publishing, and why data analysis was a project option for the Technology Project course at SFU’s Master of Publishing Program. It is hoped that through these courses, the Master of Publishing students learn the value and limits of working with quantitative data.
One such group of four students—Team Commander Data—decided they were up to the challenge. They chose to explore the PLOS Article Level Metrics (ALM) dataset. This particular version of the dataset included all metrics collected by the PLOS Lagotto application, for all PLOS articles published up until February 9, 2015. The team, however, only analyzed the articles published in 2014.
Team Commander Data are not the first to use these data or other datasets like it, as the number of studies on social media metrics (altmetrics) continues to grow. Even earlier this month, a special issue of ASLIB Proceedings focused on social media metrics was published. Clearly, social media metrics are a current topic in need of more researchers asking critical questions that will have resounding implications for the scholarly community around the world, such as: Will publishing an article on one day of the week lead to more social media mentions than on another?
Team Commander Data set out to answer this very important question, and the results were a little surprising. The team looked at three of the most widely used social media channels—Twitter, Facebook, and Mendeley (the academic social network/reference manager)—and it looks as though articles published closer to the middle of the week receive more mentions on Twitter and Facebook. This pattern holds regardless of whether one focuses on the median or the mean (although it probably makes more sense to look at the median, given the that the variables are not normally distributed).
The box plot below shows the mean (the line that goes across the boxes), the median (the division between the light and dark grey), the first and third quartiles (top and bottom of boxes), and the first standard deviation (the “whiskers” on the boxes).
One possible explanation is that social media mentions happen very close to the date of publication—within a couple of days—and that people sharing research articles on social media are most active in the middle of the week. The dataset included no data on what time the mentions happened, but it was possible to explore the relationship between time and the metrics in a little more detail by looking at the average number of mentions per month (tweets, posts, or saves) for all articles published in 2014, to see how metrics evolve over time: Mendeley saves take a long time to accumulate, so older articles have much higher saves than newer ones; Twitter articles, however, must be happening close to the publication date, as there is no decrease over time; and Facebook is somewhere in between (but closer to Twitter’s pattern).
This initial analysis maps onto our common-sense understanding of how people use Facebook, Twitter, and Mendeley. It also showcases the difficulty with doing any analysis that spans a significant time period. For meaningful results, all analyses must take into account that older articles have had more time to accumulate mentions than newer articles. One clever technique, the “Sign Test,” can be used for this purpose (see it in action in this paper). While it does not help us fully answer our initial question, the result is consistent with the assumption in our hypothesis that Facebook posts and Twitter mentions happen closer to the date of publication than Mendeley saves.
Of course, more analysis is always needed; yet, as our research reminds us, any exploration, even the most seemingly frivolous, can yield unexpected results and raise interesting questions, thus enhancing our understanding of the world.
Please leave us your comments with your own interpretations and ideas about how to take our findings further. Or, better yet, download the data and perform some analysis yourself!
ALM, PLOS (2015): Cumulative PLOS ALM Report – February 2015. figshare. http://dx.doi.org/10.6084/m9.figshare.1367535