6  Estimating Demand of Non-durable Goods Using R

Author

Nile Hatch

6.1 Non-durable Goods

In the world of entrepreneurial analytics, understanding customer demand is a pivotal factor. In the previous chapter, we explored the intricacies of durable goods. Now, let’s dive into non-durable goods, another commonly-encountered dimension of demand analysis in startups.

Non-durable goods are products that are typically consumed in a single use, such as food, toiletries, or movie tickets. These items are characterized by their transient nature and limited lifespan. Unlike durable goods that last for an extended period, non-durable goods are usually exhausted after one use. As a result, we normally expect that consumers will purchase more than one unit per period.

It is important to treat non-durable goods correctly because treating them like durable goods and asking only about consumer’s willingness to pay overlooks the multiple units that each consumer intends to purchase.

Examples of Non-durable Goods:

  • Food and Beverages: Items like fruits, vegetables, soft drinks, and packaged snacks fall into this category. They are consumed quickly and need replenishment.
  • Toiletries: Products such as toothpaste, shampoo, and soap are non-durable because they get used up with each application.
  • Tickets: Movie tickets and concert tickets are also considered non-durable as they grant access for a specific event or period.

To effectively analyze non-durable goods demand, it’s crucial to gather both price and quantity data. Unlike durable goods, where consumers typically buy one unit for an extended period, non-durable goods often involve repeat purchases within a given time frame. Therefore, understanding how price influences the quantity that individual consumers intend to purchase is essential for informed decision-making in entrepreneurial ventures.

The primary approach to collect price-quantity data for non-durable goods is to ask potential customers how many they will purchase at a series of prices, .e.g., $10, $20, etc. You name the price and they name the quantity giving us the price-quantity we need to estimate their demand model. The major drawback is that you will be asking many similar questions. If you ask too many, your respondents will suffer survey fatigue and stop answering.


6.2 Quantity Data at Various Prices

Gathering Price-Quantity Data

Collecting data about how many units potential customers would purchase at various prices is a vital step in understanding demand for non-durable goods. However, the way you approach and interact with potential customers during this data collection process can significantly impact the quality of insights you gather.

Here are some essential tips for effectively interacting with potential customers:

  1. Set the Context: Begin your conversation by providing context about the non-durable good. Explain its unique features, benefits, and how it addresses specific needs or problems. Help potential customers visualize how the product fits into their lives.

  2. Engage in a Dialogue: Before jumping into questions about pricing, engage customers in a conversation about the product.

    • The Wow Factor Test: Start by asking them to rate their initial impression of the product on a scale of 1 to 10, without predefined meanings for 1 or 10. This not only breaks the ice but also provides valuable feedback.
    • Open-Ended Queries: Encourage customers to share what they like the most and least about the product. These open-ended questions can uncover insights you might have missed.
    • Seek Suggestions: Ask for suggestions on how the product could better meet their needs. This not only provides improvement ideas but also makes customers feel valued and engaged in the product’s development.
  3. Introduce the Pricing Question Thoughtfully: When transitioning to the question about how many units they would purchase at different prices, do so thoughtfully.

    • Express Genuine Interest: Preface the pricing question by expressing genuine interest in their opinions. Explain that you’re passionate about the product’s potential and sincerely value their input.
    • Frame the Question: Phrase the pricing question as an extension of their evaluation: “Given the product’s features and benefits we’ve discussed, at a price of $3, how many units would you buy?” “At a price of $3.50, how many units would you buy?” and so on. This framing makes it feel like another aspect of rating the product rather than a commitment.
  4. Systematic Data Recording: Ensure that you systematically record the responses to willingness-to-pay questions, along with any additional feedback or comments provided by customers. This documentation will help you identify trends and patterns in consumer sentiment over time.

  5. Show Appreciation: Always express gratitude for their time and insights. Let customers know that their feedback plays a vital role in shaping the product and its journey.

In summary, gathering data about how customers would respond to various prices for non-durable goods is not just about numbers; it’s about creating a meaningful dialogue, understanding perceptions, and making potential customers feel valued in the product development process. When approached with care and genuine interest, this interaction becomes a valuable part of the journey towards mutual value creation.

Table 6.1 provides a small sample of demand data using this approach where 23 potential target customers were asked the quantity they were willing to purchase at every price in a series. The data are in a tibble named pq_data and here is a glimpse() of that data.

Price
$0.00 $0.50 $1.00 $1.50 $2.00 $2.50 3.00
10 5 4 2 2 1 1
1 0 0 0 0 0 0
1 1 1 1 0 0 0
4 2 1 1 1 1 1
1 1 1 0 0 0 0
10 3 0 0 0 0 0
1 0 0 0 0 0 0
10 1 0 0 0 0 0
1 1 0 0 0 0 0
1 0 0 0 0 0 0
3 2 1 1 1 0 0
1 1 0 0 0 0 0
2 1 0 0 0 0 0
4 3 3 3 2 1 1
6 5 3 2 1 0 0
2 2 2 2 2 0 0
1 1 1 1 0 0 0
1 1 0 0 0 0 0
2 2 1 0 0 0 0
2 2 1 1 0 0 0
4 4 4 4 3 3 2
5 4 3 2 1 1 1
8 2 2 1 0 0 0
Quantity (total) 81.00 44.00 28.00 21.00 13.00 7.00 6.00
Table 6.1: Quantity demanded by the sample as the sum of quantity demanded of each respondent at various named prices.

Calculating Price-Quantity Variables

Quantity data usually come survey platforms in the format we see in Table 6.1. We ask how many they would buy at $0.00, $1.00, etc and the survey platform sets up each price as a column for responses. Unfortunately, this data format is not tidy.

Having tidy data means that every column is a variable and every row is an observation. The columns of this data are for each price in the series we named. In other words, they are values of an underlying price variable rather than different variables. We really ony have have two variables here, price and the quantity the respondent would buy at that price. To tidy the data, we use R’s pivot_longer() function.

Constructing your AI prompt:

The pivot_longer() code we need is not very intuitive for your AI to understand. In other words, the prompt you give it needs to be very specific or it will give you code that does something different from what you need. I recommend laying it out as a list. If you edit the following to match the particulars of your data, it should work reliably.

To transform your data using pivot_longer(), follow these detailed steps:

  1. Provide a Glimpse of Your Tibble: Run the glimpse() function on your data and share the output with your AI. Use this format:
    Here is a glimpse of my data:
    (Then paste the output of glimpse(your_tibble_name))

  2. Specify Columns for Transformation: Clearly state which columns you need to transform. For instance:
    I want to use pivot_longer() on columns q_p_050 to q_p_250.

  3. Define New Column Names:
    The values from these columns should be placed in a new column named quantity.
    The names of these columns should become values in a new column named price.

  4. Provide Column Name to Value Conversion: Explain how column names should be converted to values:
    For the conversion, q_p_050 should become 0.50, q_p_100 should become 1.00, and so on.

After receiving the code:

  • Verify Column Count: Ensure you have the correct number of columns. You should see two new columns (price and quantity), and the original columns in the specified range should be gone.

  • Verify Row Count: The number of rows in the transformed tibble should equal the number of original price values multiplied by the rows in the original tibble.

This structured approach will help AI provide you with the most accurate code for your requirements.

Here is the text of a sample prompt that can be edited, copied, and pasted directly into your AI:

AI prompt for calculating price and quantity

Here is a glimpse(tb) … Please provide the tidyverse code to pivot_longer() on the columns q_p_050 to q_p_250 where the column values should be named quantity and the column names are the values of a variable named price. Note that the column names need to be converted to values q_p_050 = 0.50, q_p_100 = 1.00, etc.

Code
# Assuming your data frame is named pq_data
pq_long <- pq_data %>%
  pivot_longer(cols = everything(), 
               names_to = "price", 
               values_to = "quantity") %>%
  mutate(price = parse_number(gsub("\\$", "", price)))

?tbl-wtp-q-data provides 10 rows out of the 161 rows of the tidy demand data where every row is the quantity that an individual respondent would buy at the named price. The tibble is made longer to have 161 rows because the data are pivoted to show 7 rows (one for each price) for every one of the 23 respondents.

price quantity
0.0 10
0.5 5
1.0 4
1.5 2
2.0 2
2.5 1
3.0 1
0.0 1
0.5 0
1.0 0
Table 6.2: Tidy data of the quantities that would be purchased at every price by every respondent.

Estimate the Demand Curve

The data in the new tibble pq_long are tidy because we have one column for each of price and quantity and every row is an observation of the quantity purchased at each price by each respondent.

Now lets visualize the relationship:

Figure 6.1: Visualization of the price quantity relationship of a non-durable good.

The variance in the data at every named price makes it more difficult to verify that the data are linear but the impact of price on quantity quantity is clearly negative. We can use linear regression to estimate the model and evaluate the relationship between price and quantity.

Estimation of the Linear Demand Curve

Let’s first estimate the relationship as linear and then test for a nonlinear demand curve with the demand regression

Code
linear_demand_model <- lm(quantity ~ price, pq_long)
summary(linear_demand_model)

Call:
lm(formula = quantity ~ price, data = pq_long)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.2174 -0.7547 -0.2671  0.2950  7.2950 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   2.7050     0.2266  11.937  < 2e-16 ***
price        -0.9752     0.1257  -7.758 9.79e-13 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.595 on 159 degrees of freedom
Multiple R-squared:  0.2746,    Adjusted R-squared:   0.27 
F-statistic: 60.19 on 1 and 159 DF,  p-value: 9.787e-13

Model Interpretation:

  • In terms of overall model fit, we see that the F-statistic of 60.1869097 is significantly different from zero meaning that the model fits well. The R-squared value is 0.2746 meaning that we are explaining 27.46 percent of the variance of quantity. Given the variance we saw in the plot, it is not surprising that the R-squared is not terribly high.

  • In terms of the impact of price on quantity, we see that the estimated coefficient for price is negative as the law of demand dictates. We also see that the estimated coefficient of price is significantly different from zero with a t-value of -7.7580223 and a p-value of 9.7873694^{-13} which is very close to zero.

  • The estimated demand curve for the non-durable good is \[ \mathsf{Q = 2.7049689 -0.9751553 P} \] for each potential customer. This demand curve suggests that for every $1.00 increase in the price of the non-durable good, the quantity demanded decreases by 0.9752 units of the product per person.

  • Visually, it was not clear that the relationship is linear. Not all data exhibit this clear linear relationship so let’s examine it.

Estimation of the Nonlinear (exponential) Demand Curve

In some cases, the demand relationship for non-durable goods follows a nonlinear exponential form, which has the mathematical structure:

\[\mathsf{Q = a \cdot e^{bP}}\]

Here, \(\mathsf{e}\) represents the base for the exponential function (usually computed using exp() in R). Unfortunately, we cannot estimate this nonlinear relationship directly because traditional linear regression assumes the form \(\mathsf{y = b_0 + b_1 x}\), which the exponential demand curve does not conform to.

However, there’s good news! We can transform the data to “linearize” it, making it possible to estimate a linear relationship. To linearize the data, we take the natural logarithm (using log() in R) of both sides of the demand curve equation, resulting in:

\[\mathsf{log(Q) = \alpha + bP}\]

where \(\mathsf{\alpha = log(a)}\). In simpler terms, we log-transform the dependent variable (quantity) and then proceed to estimate a linear regression.

Here’s the R code to estimate the nonlinear exponential demand curve and the corresponding output:

Code
exponential_demand_model <- lm(log(quantity + 1) ~ price, pq_long)
summary(exponential_demand_model)

Call:
lm(formula = log(quantity + 1) ~ price, data = pq_long)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.95245 -0.39162 -0.04079  0.33310  1.25850 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  1.13940    0.07253  15.710   <2e-16 ***
price       -0.37389    0.04023  -9.294   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.5105 on 159 degrees of freedom
Multiple R-squared:  0.352, Adjusted R-squared:  0.3479 
F-statistic: 86.37 on 1 and 159 DF,  p-value: < 2.2e-16
Code
summary(pq_long)
     price        quantity     
 Min.   :0.0   Min.   : 0.000  
 1st Qu.:0.5   1st Qu.: 0.000  
 Median :1.5   Median : 1.000  
 Mean   :1.5   Mean   : 1.242  
 3rd Qu.:2.5   3rd Qu.: 2.000  
 Max.   :3.0   Max.   :10.000  

Model Interpretation:

Following the linearization of the data, we’ve estimated the demand model as a nonlinear exponential demand curve. Notice that because the quantity variable has a minimum value of 0, the log of quantity at the value is not defined so we had to add one (log(quantity + 1)) to be able to run the regression. This means that the regression model is based on \(\mathsf{log(quantity + 1)}\), not directly on \(\mathsf{quantity}\).

Assessing the goodness of fit, observe that the F-statistic of 86.3706 is significantly different from zero meaning that the model fits well. The R-squared value is 0.352 meaning that we are explaining 35.2 percent of the variance of quantity, implying that the exponential model captures more variation in the data compared to a linear model.

Regarding the impact of price on quantity, the estimated coefficient for price is negative, in accordance with the law of demand. This coefficient is significantly different from zero, as indicated by the t-value (t-value) and the p-value (p-value), which is very close to zero.

The estimated nonlinear exponential demand curve for non-durable goods can be expressed as:

\[ \mathsf{log(Q + 1) = 1.1394 -0.3739 P} \] or \[ \mathsf{Q = e^{1.1394 -0.3739 P} - 1} \] where wtp represents a proxy for price.

Interpreting the exponential demand curve, for every $1.00 increase in the price of the non-durable good, the logarithm of the quantity demanded across the sample of respondents decreases by 0.3739 units of the product. It’s important to note that interpreting changes in the logarithm of a value may not be intuitive to most individuals.

This methodology enables us to estimate the demand curve for non-durable goods and understand how price influences consumer behavior in this context.


6.3 Aggregate Quantity Data at Various Prices

In Section 6.2, we gathered and tidied quantity data provided in response to a series of prices. As we saw the resulting demand curve predicts demand for an individual customer. In Section 4.4.2, we gathered willingness to pay and quantity data that led us to a demand curve for the collective sample of respondents rather than individual respondents.

In this section, we will explore how to transition from individual demand curves, represented by price-quantity data for each respondent, to aggregated demand curves that reflect the preferences of the entire sample.

Understanding the Transition

To begin, let’s briefly clarify the difference between individual and aggregated demand curves:

  • Individual Demand Curves: These curves represent the demand behavior of individual consumers. Each respondent’s willingness to pay and corresponding quantity data form a unique demand curve. They provide insights into the preferences of specific consumers.
  • Aggregated Demand Curves: These curves reflect the combined demand patterns of the entire sample of respondents. They result from aggregating the individual demand data to create a collective representation of demand behavior. These curves aim to capture overall market trends.

Aggregating the Data

  • Grouping by Price: To aggregate the individual demand data, we start by grouping the data by price. This process consolidates responses at each price point.
  • Summarizing Quantity: Next, we summarize the data by calculating the total quantity demanded at each price level. This aggregation process transforms the individual quantity data into collective demand figures.
Prompt your AI for code for aggregation

This is one of those requests that can cause chatGPT to generate many different commands. That means our prompts are often too ambiguous. Here is a template prompt that you can edit and paste into ChatGPT to get the code you need. The key is tell it the tibble name (call it a tibble and you will signal that you want tidyverse code or ask for tidyverse code directly), the price and quantity variables and ask it to aggregate the quantity variable for every price.

I have a tibble tb with price and quantity variables. I want to aggregate the quantity variable for every price.

Here’s the R code to achieve this aggregation:

Code
# Grouping data by price and summarizing quantity
aggregated_pq_data <- pq_long |>
  group_by(price) |>
  summarise(Q = sum(quantity))
price Q
0.0 81
0.5 44
1.0 28
1.5 21
2.0 13
2.5 7
3.0 6
Table 6.3: Price and quantity data aggregated for the sample of 23 respondents.

Notice that aggregation has reduced the data to just seven rows – one for each price we named when collecting data from the respondents. This reduction is unavoidable and usually leaves us with relatively small datasets. The more prices we name during data collection, the greater the bias from survey fatigue.

Estimate the Aggregated Demand Curve

First, let’s visualize the aggregated demand data with a scatterplot to gain insights into the overall demand trend. We’ll then proceed to estimate both a linear and exponential demand curve to assess potential nonlinearity.

Figure 6.2: Scatterplot of the relationship between aggregated price and quantity data for the sample or resondents.

The scatterplot in Figure 6.2 shows the same negative relationship in the aggregated data that we saw in the individual data in Figure 6.1 but with less variance. The nonlinear relationship is more pronounced in the aggregated data as well.

Estimate the Linear Aggregate Demand

We’ll start by estimating a linear demand curve for the aggregated data using regression analysis.

Code
aggregate_linear_model <- lm(Q ~ price, aggregated_pq_data)
summary(aggregate_linear_model)

Call:
lm(formula = Q ~ price, data = aggregated_pq_data)

Residuals:
       1        2        3        4        5        6        7 
 18.7857  -7.0000 -11.7857  -7.5714  -4.3571   0.8571  11.0714 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   62.214      8.292   7.503 0.000665 ***
price        -22.429      4.599  -4.876 0.004568 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 12.17 on 5 degrees of freedom
Multiple R-squared:  0.8263,    Adjusted R-squared:  0.7915 
F-statistic: 23.78 on 1 and 5 DF,  p-value: 0.004568

Model Interpretation:

  • In terms of overall model fit, we see that the F-statistic of 23.7786996 is significantly different from zero meaning that the model fits well. The R-squared value is 0.8263 meaning that we are explaining 82.63 percent of the variance of quantity.

  • In terms of the impact of price on quantity, we see that the estimated coefficient for price is negative as the law of demand dictates. We also see that the estimated coefficient of price is significantly different from zero with a t-value of -4.8763408 and a p-value of 0.0045675 which is very close to zero.

  • The estimated demand curve for the non-durable good is \[ \mathsf{Q = 62.2142857 -22.4285714 P} \] for the sample of potential customers. This demand curve suggests that for every $1.00 increase in the price of the non-durable good, the quantity demanded decreases by 3.4474 units of the product.

Estimate the Exponential Demand Curve

Now we linearize the data to estimate the nonlinear exponential demand curve.

Code
aggregate_exp_model <- lm(log(Q) ~ price, aggregated_pq_data)
summary(aggregate_exp_model)

Call:
lm(formula = log(Q) ~ price, data = aggregated_pq_data)

Residuals:
       1        2        3        4        5        6        7 
 0.10204 -0.07066 -0.08507  0.06481  0.02280 -0.15867  0.12475 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  4.29241    0.07952   53.98 4.13e-08 ***
price       -0.87513    0.04411  -19.84 6.01e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1167 on 5 degrees of freedom
Multiple R-squared:  0.9875,    Adjusted R-squared:  0.9849 
F-statistic: 393.6 on 1 and 5 DF,  p-value: 6.01e-06

Model Interpretation:

  • In terms of overall model fit, we see that the F-statistic of 393.6170986 is significantly different from zero meaning that the model fits well. The R-squared value is 0.9875 meaning that we are explaining 98.75 percent of the variance of quantity. The increase in R-squared and F-statistic suggest that the demand curve is indeed an exponential form.

  • In terms of the impact of price on quantity, we see that the estimated coefficient for price is negative as the law of demand dictates. We also see that the estimated coefficient of price is significantly different from zero with a t-value of -19.8397858 and a p-value of 6.0099318^{-6} which is very close to zero.

  • The estimated exponential aggregate demand curve for the non-durable good is \[ \mathsf{log(Q) = 4.2924 -0.8751 P} \] or \[ \mathsf{Q = e^{4.2924 -0.8751 P}} \]