Tips from the Camp

Feb 20, 20223 min read

There are several sites and institutions that offer certifications and courses on data analytics and coding. I went through a few options before deciding to enroll with DataCamp. For one, I couldn't beat the price (got a year's subscription for $160 bucks!... not too shabby).

Several universities offer courses within just a few months, which is pretty neat. However, when you look at their prices, it can be pretty intimidating (I believe I saw that the University of Texas in Austin offers a data science boot-camp for like $30,000-$40,000!!!). I myself prefer a mentor that is hands-on and available for whenever I get stuck on certain tasks, but the price... that price.... Back home in Laredo, TX, Texas A&M International University (GO DUSTDEVILS!!) offers a similar boot-camp for a bit less, but still VERY expensive coursework. Is paying that much worth it? My answer is an honest "I have no clue, but I got rent to pay". Hence I decided to skip the institution route and go full fledged on trying a "self-taught" boot-camp.

I would say there is plenty of perks to joining DataCamp. The first and most obvious, can't beat that price. But like any self-taught camp, it is imperative to realize that most, if not all the work will have to be done on your own time and pace (could be a good thing for some, not so much for others). There is also a ton of guided and unguided projects to choose from (helps when developing your portfolio for job hunting). There is so much in fact that one may feel overwhelmed by how much they could be shoving into their brain capacities. Thus, I recommend taking an extra step slow and develop notes along the way when working on courses. This will help down the line when working with your own datasets and exploring analytics unguided.

The way I have tackled this has been by copying and pasting every useful piece of information from the lessons and exercises themselves and creating a library of notes on every subject. I will use an example below:

Scatter plots (using geom_point()) are intuitive, easily understood, and very common, but we must always consider overplotting, particularly in the following four situations:

Large datasets
Aligned values on a single axis
Low-precision data
Integer data

Typically, alpha blending (i.e. adding transparency) is recommended when using solid shapes. Alternatively, you can use opaque, hollow shapes. Small points are suitable for large datasets with regions of high density (lots of overlapping). Let's use the diamonds dataset to practice dealing with the large dataset case.

# Plot price vs. carat, colored by clarity

plt_price_vs_carat_by_clarity <- ggplot(diamonds, aes(carat, price, color = clarity))

# Add a point layer with tiny points

plt_price_vs_carat_by_clarity +

geom_point(alpha = 0.5,shape = ".")

# Plot price vs. carat, colored by clarity

plt_price_vs_carat_by_clarity <- ggplot(diamonds, aes(carat, price, color = clarity))

# Set transparency to 0.5

plt_price_vs_carat_by_clarity + geom_point(alpha = 0.5,shape = 16)

Here, this piece of notes has three contents: 1. the exercise instructions; 2. the piece of code and; 3. the resulting visualizations. The exercise instructions (highlighted in white and italicized) have been just copied and pasted onto a word document. The pieces of code (highlighted in black) were first solved, then once the answer to the exercise was correct, it was just copied and pasted underneath the corresponding exercise. Lastly, I copied and pasted the graphs/visualizations with each corresponding piece of code.

This will serve as great reference when working on future projects. All you have to do is go back to your notes and copy what has already been solved for you.

Tips from the Camp

Recent Posts

Comments

Send Me A Message