Depression is a very common mental health disorder with a devastating social and economic impact. It can be costly and difficult to detect, traditionally requiring a significant number of hours by a trained psychiatrist. Recently, machine learning models have been trained for depression screening using patient voice recordings from an interview driven by a virtual agent. To engage the patient in a conversation and increase the quantity of responses, the virtual interviewer asks a series of follow-up questions. A patient understandably would prefer to have to answer fewer questions and thus reduce the time burden of the interview. We therefore assess if these series of follow-up questions have a tangible impact on the performance of deep learning models for depression classification. Specifically, we study the effect of including the vocal and transcribed replies to one, two, three, four, five, or all follow-up questions in the depression screening models. We notably apply unimodal and multimodal pre-trained transfer learning models to classify different subsequences of audio and text. We find that follow-up questions can help to increase the F1 scores for the majority of questions with the best performing models using the responses to multiple follow-up questions. Our results can be leveraged for the design of future mental illness screening applications by informing us not only about the selection of the most effective questions but also the number of follow-up questions typically required for screening to produce reliable results.