Your new client is a health insurance company. After a lengthy review of their business,

Your new client is a health insurance company. After a lengthy review of their business, the insurance company has decided to prioritize improvements in medication adherence. For our initial work, we will focus on patients with heart disease and how well they take their medications.

Your team has received some modest training from a physician. Here are the basic facts you need to know. Heart disease is one of the most pervasive health problems, especially for older patients. The initial diagnosis typically occurs too late. Most patients only become aware that they have heart disease after experiencing an acute episode. This can be limited to moderate symptoms, which might be treated by either medications or a light procedure. In more severe cases, the patient might suffer a major event such as a myocardial infarction (heart attack) or need a significant surgical operation. Whether minor or major, these events often include a hospitalization. After the initial diagnosis, patients are typically prescribed a range of medications. Three primary therapies include ACE inhibitors, beta blockers, and statins.

The insurance company has helpfully compiled data on a large number of patients. They have included a number of important clinical factors about their baseline conditions. Then, starting from the time of their initial diagnoses of heart disease, the patients were tracked based upon which medications were filled at the pharmacy. The medication records are presented in the form of panel data. A single patient’s records are linked by a unique identifier. The time measurements represent the number of days since baseline. Prescriptions are typically filled for 30 or 90 days of medications. For this study, you may assume that the patients qualified for our study and reasonably could have been expected to be prescribed all of the medicines we are tracking.

In this project, you will develop an approach to working with the information. The client company has provided a list of questions they would like to address. In addition to building the report, our team would also like you to present recommendations on how to improve upon the infrastructure. We also want you to identify opportunities for the client to make use of the information you’re working with in novel ways.

This project is divided into 4 parts:

  • Part 1: Summarizing the data.
  • Part 2: Answering specific question about medication adherence.
  • Part 3: Generalizing and automating the reporting infrastructure for use beyond the current version.
  • Part 4: Identifying opportunities.

Details

Part 1: Summary

How would you summarize the data? For each table, write 2-4 sentences with relevant information. Briefly describe what is measured in the data and provide a summary of the information. You can show a table or graphic, but keep things short.

This part of the report will be directed to your internal team at the consulting company. It is intended to document the sources of information that were used in the project. It will also describe the data in less technical terms to team members who are not data scientists. If another member of the team joins the project later, they will rely on your descriptions to gain familiarity with the data.  To that end, we recommend providing some instructions that will help other consultants use the information more effectively.

Part 2: Specific Questions

In addition to your summary, our team has identified specific questions of interest. Please provide these answers in output that is easy to read (e.g. tables).

This part of the report will be directed to medical case management teams throughout the client’s company. The idea is to give them the useful information they need to act on the specific questions they posed. Plan your communication accordingly.

Notes: Using data.table, most of these calculations can be solved in a moderate number of steps. Many of the questions may require information from multiple tables. Use the merge function to combine tables as needed. HTML-friendly tables can be constructed using the datatable function in the DT package.

These questions were carefully crafted based upon the client’s needs. It is important to answer them based on what is stated. To that end, please read each question closely and answer it accordingly.

Questions

  1. What was the median length of followup?  What percentage of the patients had at least 1 year of records?
  2. For patients with at least 1 year of follow-up, their one-year adherence to a medication is the proportion of days in the first year after diagnosis during which the medication was possessed. For each medication, what was the average one-year adherence for the population? Use only the patients with at least 1 year of follow-up records.
  3. How many medications are the patients taking? For patients with at least one year of follow-up, use their records during the first year after the initial diagnosis. Calculate the overall percentage distribution of the days that the patients are taking 0, 1, 2, and all 3 medications.
  4. What is the impact of age, sex, region, diabetes, and baseline condition on the one-year adherence to each medication? Use only the patients with at least 1 year of follow-up records. Fit separate linear regression models for each medicine.  Then briefly comment on the results.
    1. ACE Inhibitors
    1. Beta Blockers
    1. Statins
  1. For each medicine, what percentage of the patients filled a prescription in the first two weeks after their initial diagnoses?
  2. Now let’s compare those who filled a prescription for a statin in the first two weeks after diagnosis to those who did not. Do these two groups have different baseline covariates? Compare the groups based on their ages. Then compare the distribution of baseline conditions in the two groups. For continuous variables, compare their means using a t-test. For the categorical variables, compare their distributions using a chi-squared test.
    1. Age
    1. Baseline Conditions
  3. How do the variables of age, sex, region, diabetes, and baseline condition impact the likelihood of initiating a medication within 14 days? For each medicine, fit a logistic regression model and comment on the results.
    1. ACE Inhibitors
    1. Beta Blockers
    1. Statins
  4. For those who did fill their prescriptions within 2 weeks, how long does it typically take to fill that first prescription after the initial diagnosis? For each medicine, provide the mean, median, and standard deviation in units of days.
    1. ACE Inhibitors
    1. Beta Blockers
    1. Statins
  5. How does filling a prescription in the first two weeks impact adherence? If we want to see that a medicine is working, we need to start the observation after the patient has had a chance to fill the prescription. To answer this question, we will follow a number of steps:
    1. Identify which patients filled a prescription in the first two weeks.
    1. Then, for each patient with at least 379 days of followup, measure the one-year adherence rate (see Question 2) starting at two weeks after the initial diagnosis. This interval will begin at day 14 and last for 365 days.
    1. Fit a linear regression model of this one-year adherence including the baseline covariates (age, sex, region, diabetes, baseline condition) and an indicator of whether this patient filled a prescription for the medicine in the first two weeks.

Perform this analysis for each medicine and comment on the results.

  • ACE Inhibitors
    • Beta Blockers
    • Statins
  1. Once a patient starts a medication, how long do they continuously have a filled prescription? For each patient who filled a medication, start with the first filled prescription and count the duration of days until a gap occurs or follow-up ends. Then provide the mean, median, and standard deviation for these durations. Do this separately for each medicine.
    1. ACE Inhibitors
    1. Beta Blockers
    1. Statins

Part 3: Generalization

This part of the report will be directed internally to your team’s engagement manager. The idea is to present these approaches to your team. The work will then be conveyed to the client’s technical team and middle managers who are working closely with you on the project. Plan your communication accordingly.

Questions

  1. Did you see any problems with the data set? If so, whom would you report them to, and what would you do to address them? What would be different about the next version of the data?
  2. If the organization wants to monitor this kind of information over time, what would they need to provide, and at what frequency?
  3. How would you build on the reporting capabilities that you have created? What would you design next?

Part 4: Opportunities

This part of the report will be directed externally to your client’s senior leadership. Your work will help to determine the future direction of the project and the company’s contract with this client. Plan your communication accordingly.

Questions

  1. What are some opportunities to learn valuable information and inform strategic decisions? List a number of questions that you might explore.
  2. What kind of interventions would you build to help improve medication adherence? Which populations would you work with? How would you help them?
  3. How would you approach other decisionmakers within the organization to assess their priorities and help them better utilize the available information?

Notes

This project involves extensive coding on large data sets. As a training exercise, we are asking you to develop your skills in R and to use the data.table package for processing applications.

Part of your work is to understand what is measured in the tables. The fields in the tables should make intuitive sense. However, the organization has not provided a data dictionary. It will be up to you to gain this understanding of what is measured.

The goal of this project is to present a report that can be delivered to managers of the client’s organization. To that end, the final output should not show your code or calculations. Make the explanations clear and concise.

Assessment

  • Part 1: Summarizing the adherence data: 10 points, evenly divided among each table.
  • Part 2: Answering specific question about adherence: 50 points, evenly divided among each question. Partial credit may be awarded for modest effort in the right direction (1-2 points) or a largely correct approach with small mistakes (3-4 points).
  • Part 3: Generalizing and automating the reporting infrastructure for use beyond this month’s report: 15 points, evenly divided among each question. Partial credit may be awarded based on judgments.
  • Part 4: Identifying opportunities: 25 points, with 5 points apiece for Questions 1-3 and 10 points for Question 4.

Submission

Please turn in the following files:

  • Your reporting code (.Rmd file);
  • Output file showing your answers to parts 1-4 (.html file);