To answer the question “does coffee gets cold faster than tea” we need to measure how temperature changes with time. As an initial reference, I made a simple experiment with just boiling water.

A first approach can be seen in the following time-lapse video:

We can watch the video frame by frame and write down temperature and time. An easier approach is to write only when temperature changes. The advantage of this approach is that we only need a cellphone, a thermometer and a timer. In fact, we can even omit the timer if we know the frame rate of the camera. Most cameras are very precise in this timing. I included a timer just to have a reference. I’m not even sure if the timer is exactly one-second per tick.

From this first experiment we can learn a couple of things. We know that a cup of water takes more than ten thousand seconds (1:46 hours) to cool down to room temperature. In the first frames the temperature was 93°C and the last temperature was 27°C. Second, temperature does not change very fast, except at the beginning.

This means that we need to record the temperature for several hours, but we only need a sample every few seconds, or even once every minute. The cooling time will depend also on the volume of water. When I use a small hot-water bottle, my feet get cold during the night, but when I use a big bottle, I’m fine.

Another thing that we notice is that ambient temperature will have an effect on the cooling. So we need a second thermometer to control the air temperature.

The disadvantage of this approach is that it requires boring manual work to extract the data. We can do better. We have robots that work for us.

I prepared a second experiment with a data logger. The device registered water temperature, air temperature, air pressure and altitude, every 10 seconds (nominally). I prepared the experiment in the late evening and let it run all night, near a closed window.

The air pressure is probably unnecessary for the current experiment, but the device was already measuring it, so keeping these values will help us to prepare for when we measure the building height.

All the data is on Google Sheets and in a text file, which is easier to process with R. The analysis can be done in several ways, the easy parts are easier in the spreadsheet, the advanced parts are easier in R. Choose your tools wisely, they will serve you for all your life.

The first thing to do is to familiarize oneself with the data. It is a good idea to add a column with the row number, which we usually call row id or just id. Please notice that the starting value of seconds is arbitrary, it only reflects how long the machine was turned on before the first measurement.

id	seconds	temp_water	temp_air	pressure	altitude
1	6893	25.0625	26.3804	101279	3.62597
2	6903	25.0000	26.3927	101301	1.91472
3	6913	25.0000	26.3191	101300	2.00618
4	6922	27.6250	26.2700	101308	1.33930
5	6932	32.3125	26.1964	101301	1.91281
6	6942	38.8125	26.1534	101308	1.33930

After looking at the first few lines, it is always good to plot the data and see how does it look. This is easy in Google Sheets or Excel, and also in R. For now let us just focus on temperature.

Figure 1. Temperature measurements. — **Figure 1.** Temperature measurements.

We see that the air temperature remains more or less constant during all the mesurements. It will be easier to analize.

Figure 2. Air temperature measurements. — **Figure 2.** Air temperature measurements.

Well, it is not so constant. There is a sharp temperature reduction in the first seconds, and then it goes down more or less linearly. My guess is that the initial temperature was the one from the room where I prepared the device, and then it cooled to the experiment room temperature. Then it cools as the night cools, until the sunrise. The window looks to the east, so it gets warmer in the morning. In retrospective, it would have been wise to let the device cool to the room temperature before starting, and record the real time from a real clock.

A sharp eye may also notice that there are times when the temperature rises. My guess is that these are the times when the fridge motor was working. The experiment room was my kitchen. Refrigerators keep their interior at low temperature by transferring heat to the exterior, so the room gets warmer.

Evaluating uncertainty

We will use air temperature as our main variable. We can choose any point on time and take several values, let’s say 𝑁. We want to evaluate, for each position: average, standard deviation, and standard error. And we will do it for several values of 𝑁.

Here we evaluated 𝑁 equal to 3, 10, 20 and 30. You can try other values. The first rows of results look like the tables on the margin.

Result for N = 3
id	avg	stdev	stderr
3	26.36	0.03942	0.02276
4	26.33	0.06176	0.03566
5	26.26	0.06176	0.03566
6	26.21	0.05897	0.03404
7	26.15	0.05247	0.03029
8	26.09	0.06764	0.03905

Result for N = 10
id	avg	stdev	stderr
10	26.18	0.1608	0.05085
11	26.13	0.163	0.05154
12	26.08	0.1512	0.04781
13	26.03	0.1416	0.04479
14	25.99	0.1302	0.04118
15	25.95	0.1203	0.03804

Result for N = 20
id	avg	stdev	stderr
20	25.97	0.246	0.055
21	25.93	0.2432	0.05438
22	25.89	0.2334	0.05218
23	25.85	0.2251	0.05034
24	25.81	0.2183	0.0488
25	25.77	0.2129	0.04761

Result for N = 30
id	avg	stdev	stderr
30	25.79	0.338	0.06171
31	25.75	0.3324	0.06068
32	25.71	0.3225	0.05887
33	25.67	0.3131	0.05717
34	25.64	0.3038	0.05547
35	25.61	0.296	0.05405

It seems that the standard error get worse with bigger N, but it is just a transient phenomena. Looking the full picture we observe these graphics.

Figure 3. Standard error with different sample sizes. All data. — **Figure 3.** Standard error with different sample sizes. All data.

In the first seconds the standard error is large. This is due to the fast change in the value we are measuring. In this case there is a transient period before the temperature stabilizes. Things are be more clear if we focus on the values after the transient.

Figure 4. Standard error with different sample sizes. Omitting 150 samples of transient — **Figure 4.** Standard error with different sample sizes. Omitting 150 samples of transient

We observe that the standard error is random, since we evaluate it from random data, but it follows some patterns. Being pessimistic, we can take the maximum value for each one. And we can look at the Student’s t-distribution table, to know the factor for a 95% confidence. Finally, we calculate the uncertainty on each case.

Final Result
	Max Std. Error	k (95%)	Interval width	Interv. 1 sigfig
3	0.01449	4.303	0.06236	0.06
10	0.006826	2.262	0.01544	0.02
20	0.005027	2.093	0.01052	0.01
30	0.004675	2.045	0.009562	0.01

Therefore, we can have at most one decimal when we average 3 samples, two decimals when we average 10 samples, and three decimals with 20 or 30 samples. We can see this in the following figure.

Figure 5. Rounded average temperatures for different sample sizes. — **Figure 5.** Rounded average temperatures for different sample sizes.

There seems to be no significant difference between averaging 20 and 30 samples.

Can you replicate these results?

Can you repeat this analysis for pressure?

About Time

According to the original design, there should be one sample every 10 seconds. But if we look a the difference of seconds between consecutive rows, we find that over 34% of times the timer advanced 9 seconds.

Figure 6. Distribution of time differences. — **Figure 6.** Distribution of time differences.

So the measuring device was sampling faster than intended. How fast was it running?

Figure 7. Comparison of expected time versus real time of sampling. — **Figure 7.** Comparison of expected time versus real time of sampling.

We see that instead of “10 seconds every 10 seconds” we have a little less.

How much time really passes between samples?

I look forward for your comments.