CS303 Experimental Notebook: April 2011

Friday, April 29, 2011

Pilot Experiment

I ran a small pilot experiment this week just testing two conditions. The experiment was testing knowledge retention of users that are given quizzes during reading an article vs. no quizzes during reading. The test of knowledge retention was with a quiz at the end with different questions than any in the middle.

I started my work by learning php, and mocking up some simple quiz pages using php. I created a 5 page workflow like this: instructions -> reading wiki page -> quiz questions page -> survey questions page -> completion page.Users could only move through in a linear matter, and times on each new page visit are recorded. User responses to quiz questions are also recorded, and written out along with their user session ID to a file on the server. php sessions are used to keep track of the user throughout the process.

I deployed the two versions (with and without internal quiz) on mechanical turk, getting 10 responses per trial.
The hypothesis is that users will perform better in the final quiz if they have the treatment with quizzes interspersed in their article. The null hypothesis would be that both treatments produce the same end scores.

Assuming a 0.8 desired statistical power (a good guess based on wikipedia reading...) and a Cohen's d value of ~0.2 (I expect the mean of average test performance to differ by about this much, and expect the std. deviations to remain the same between the two conditions), I should be looking at a sample size of about 400 for statistical significance. Since this is just a pilot trial and samples cost money on mechanical turk, I opted to use a smaller sample size this time.

I think I want to use a t-test to statistically analyze my results, since this compares multiple groups that follow a Gaussian distribution, which should be the case for test scores (with enough questions). A chi-squared test would not work because there are not two straight answers.

More on pilot results later...

Buddy Post

My Buddy is Ravi, and his blog is located here: http://cs303blog.blogspot.com/

The information you’ve gathered on the psychological reasons and nature behind procrastination was really interesting to me. I personally can completely relate to the findings that more choices or more difficult tasks can induce the most procrastination. I really like that your hypothesis is clean, concise, and easily testable – which my hypothesis is sorely missing right now.

Some feedback on your experimental design as listed: I see for each time a user goes to a greenlisted site, you are measuring if it has latency added, and if the user closes the tab or goes to another tab before the tab loads, and then recording the length of the users visit to that tab. I can see some potential areas of concern in the going to different tabs, and in the length of users visit. The length of a user visit to any website, including a procrastination website, can depend heavily on many factors completely outside your study, such as the users mood, time of day, how much time they have, how many other things they need to do, outside interruptions / stimuli, and even how much new page content there is on the site since they last checked. So I don’t know how accurate recording how long someone is on a site would be between the two conditions. For measuring if the latency prompted users to go to other tabs to wait for it to load, I feel like something like this could be very dependent on user preference. Personally, if a site I am trying to go to is taking more than a few seconds to load, I will automatically go to another tab and do another short task, and then go back to the tab in a few seconds or minutes to check that it loaded and use it. This changing of tabs doesn’t mean I won’t go to that site, it just means I go to a different site (often a different procrastination site even) while waiting for it to load. This could even result in more procrastination…if the user opts to open two sites because one is too slow.

Also, what will you do about sites like facebook, which auto-update periodically if they are left open. Many people I know browse facebook using the feeds, and just leave facebook open all the time and just tab back to it to check any new stuff in the feed. Will this type of browsing behavior still be recorded in your study? Or are you not allowing users to keep tabs to multiple procrastination sites open at the same time as a control for this?

One possible modification to your experiment to get around some of these issues is to maybe use the rate at which users go to procrastination sites / how many they go to in the time period, and couple this with giving different users different rates of adding in the latency or not – and see if the rate of adding in latency has any correlation with rate of going to procrastination sites. But this still does have the issue of you needing to measure a baseline for each person of how often they go to these types of sites normally. Even though procrastination varies greatly week to week, maybe if you measure long enough and collect a lot of data you can get around the weekly variations.

Your project is looking great and I am excited to see the results. Hopefully you can cure the procrastinator in us all!

Friday, April 22, 2011

Experimental Draft 2

This week I was looking into research on the efficacy of video lecture learning versus in class traditional lectures. There have been many studies recently in this area as video lectures are becoming more prevalent, and the general consensus of results are that there is no statistically significant evidence that video lectures are less effective than in person lectures, although the majority of people prefer the experience of the regular lecture [1,2]. There has been some evidence showing that watching video lectures of lectures you have already attended in person can increase how well you do in a course [1].There has also been some research into how to best present video lectures for optimal learning [1,3]. In order to test out a hypothesis on distance vs. traditional learning, I would need to measure students learning in some way, and this can be done by administering quizzes. I then researched a bit into quizzes and learning [1,4]. I had planned on doing in class quiz related trials and possibly online lecture quiz trials, but after reading about [7], it seemed like a great way to test quiz efficacy on online learning through Wikipedia (and also a great way to get participants since so many people seem interested).

papers :
[1] M.B. Wieling, W.H.A. Hofman. The impact of online video lecture recordings and automated feedback on student performance. Computers & Education. 2010.
[2] Benjamin E Schreiber, Junaid Fukuta, Fabiana Gordon. Live lecture versus video podcast in undergraduate medical education: A randomised controlled trial. 2010.
[3] Ading, Tanja, Astrid Gruber, and Bernad Batinic. "Learning with e-lectures: the meaning of learning strategies." Educational Technology & Society 12.3 (July 2009): 282(7). Academic OneFile. Gale. Stanford University Libraries. 22 Apr. 2011 http://find.galegroup.com/gtx/start.do?prodId=AONE&userGroupName=stan90222
[4] Sansone, Carol, Tamra Fraughton, Joseph L. Zachary, Jonathan Butner, and Cecily Heiner. "Self-regulation of motivation when learning online: the importance of who, why and how." Educational Technology Research and Development 59.2 (April 2011): 199(14). Academic OneFile. Gale. Stanford University Libraries. 22 Apr. 2011
http://find.galegroup.com/gtx/start.do?prodId=AONE&userGroupName=stan90222
[5] Steenhuis, Harm-Jan, Brian Grinder, and Erik Joost De Bruijn. "The use(lessness) of online quizzes for achieving student learning." International Journal of Information and Operations Management Education 3.2 (Jan 18, 2010): 119. Academic OneFile. Gale. Stanford University Libraries. 22 Apr. 2011
http://find.galegroup.com/gtx/start.do?prodId=AONE&userGroupName=stan90222
[6] Johnson, Danette Ifert, and Kaleigh Mrowka. "Generative learning, quizzing and cognitive learning: an experimental study in the communication classroom." Communication Education 59.2 (April 2010): 107(17). Academic OneFile. Gale. Stanford University Libraries. 22 Apr. 2011
http://find.galegroup.com/gtx/start.do?prodId=AONE&userGroupName=stan90222
[7] http://www.reddit.com/r/AskReddit/comments/gubge/if_wikipedia_were_to_have_a_quiz_section_at_the/

Hypotheses: I could test multiple hypothesis related to quizzes and/or online vs. traditional learning. Multiple shorter quizzes placed periodically through the lecture / material are more effective than a longer quiz at the end. Also, I think that providing immediate feedback on quizzes verses delayed feedback or no feedback will increase the user's satisfaction and learning from them. I can also test out the effect of notifying the user of a quiz beforehand or not on quiz performance and learning, and the effect of providing a "pre-quiz" beforehand on post quiz performance and learning. I could also test the effect of positive feedback (even fake positive feedback) on user performance - for example by keeping track of the user's percentage of correct answers in real time and letting them see this (but making it up for some users).

Experimental Method: I think generating a Wikipedia quiz website would be a great way to test several of these quiz hypotheses and garner a larger number of participants than an in class quiz trial. The experiment would involve picking several subjects of general interest on Wikipedia, and coming up with several quiz questions for each subject based on the material in the page. Then, several different versions of a Wikipedia quiz site would be generated possibly including:

1. The article followed by several quiz questions at the end
2. The article with single questions intermixed with the subsections
A. The viewing page with a notification that quizzes were present
B. The viewing page with no prior notification that quizzes would be appearing.
AA. The viewing page keeping track of user's performance in real time
BB. The viewing page always showing user's performance as well above average
CC. The viewing page always showing the user's performance as well below average

Setups 1 and 2 could be combined with any of the other setups. The A/B comparison would show any possible effect of having prior knowledge of the quiz on quiz performance and on learning. The AA/BB/CC comparison could show any effect of positive / negative feedback on user performance. I am still considering different variations of these trials, or different trial setups entirely - the pilot study might help with this. For all trials, after viewing a page and completing a quiz, the user could complete a short survey on their experience with the page and quiz, measuring their enjoyment and how much they feel they learned using a Likert scale. The relevant data gathered during the process would include the quiz responses, the amount of time spent on the page, and the survey responses. I am weighing the idea of allowing users to go through several pages vs. just one.

The website for collecting this data could be deployed and advertised online, and if there is trouble finding participants, then a service like mechanical turk could be used to get users to complete the reading/quizzes, although this is less ideal then finding people genuinely interested in the topic.

Friday, April 15, 2011

Experiment Draft 1

*Just to note, this is completely a draft, and I am not even fully decided on the topic of the experiment*

The experiment I am thinking about doing right now is something in the realm of measuring conversational involvement, or some specific aspects of conversational involvement within different types of communication. Conversational involvement is a measure of how cognitively and behaviorally engaged participants are in a conversation. It measures specific nonverbal behaviors along five dimensions: immediacy, expressiveness, interaction management, altercentrism, and social anxiety. In prior work on conversational involvement, a number of behaviors are identified to strongly correlate between high and low involvement in conversations. Some of these behaviors include general proxemic attentiveness, forward lean, relaxed laughter, coordinated speech, number of silences and latencies in communication, number of object manipulations, facial animation, vocal warmth, and amount of random body movement. Some of these can be easily quantitatively measured, like silences / pauses, forward lean, amount of relaxed laughter. I want to explore the conversational involvement during different online interactive communications vs. in person communications - and I want to see the effect of having an initial in person conversation with controlled high or low conversational involvement on subsequent online interactive communications.

An initial draft of the experiment would involve video tapping the short conversations on set topics between two people in person. One person would be the study subject, and the other would be a 'tester'. The 'tester' would vary their conversational involvement between trials from high to low amounts by changing their behavior according to the conversational involvement metrics. The subjects would later be recorded engaging in a subsequent video chat conversations or audio conversations with the tester. The video of them can be analyzed to extract their level of conversational involvement during these sessions. The subjects would also be recorded in another chat with a random tester they had not met before, and the conversational involvement of these two trials would be compared. Also the different conversational involvements of the subjects in the second conversation would be compared against each other depending on whether their initial interaction with the tester had low or high conversational involvement on the part of the tester.