Data Analyst
SQL 很重要, Hadoop & Mapreduce
R:Statistical package
Python
In a data analysis project, data must first be gathered and wrangled into a form that makes it easy to work with
了解整個資料的分佈和結構,features 之間的關係,要做 data modeling (describing the patterns and trends in data)
This process involves formally describing the patterns and trends in data so as to explain the observations or to build functions for predicting future outcomes
最後做一些總結和視覺化,讓大家知道結果; 資料分析不是線性工作,可以回顧與改善流程
數學和統計對 DA 很重要,select the proper statistical tests to perform and understand how to interpret the results,再加上 programming 和好奇心,對奇怪的結果有感覺
R is a language that is built for statistics. It is supported by a very broad number of packages that add to the base functionality of the language, some of which are written for very specialized tasks. It is very easy to create good-looking visualizations to hasten exploration of the data, using packages like ggplot2. Packages like dplyr and tidyr are useful for reshaping data. On the other hand, R is fairly specialized in its focus, and it is much harder to use it for more general analysis tasks. In particular, when it comes to data wrangling (which can take up a majority of a data analyst’s time), R can be difficult to work with. Since R is open source and used by many people for statistical analysis, there is a large online community of support.
Python is a general programming language that is up and coming in the data analysis world. The breadth of data analysis packages does not compare to the world of modules available in R, though packages like scikit-learn, matplotlib and seaborn are expanding the ability to use Python for machine learning and visualizations. Python also tends to be easier to learn and understand than R. The flexibility of Python as a general programming language also works greatly in its favor as a one-stop language for handling all parts of the data analysis workflow. As noted above, Python is much more tuned to general processing tasks (as performed in data wrangling) than R.
Ultimately, it can be useful to know at least a little bit about both languages, since they provide their own strengths to the data analysis table.
Providing many insights for products
提供一些對產品改善的意見
Data visualization
把資料視覺化也是他們的工作,可用下列兩個工具
跟套 CSS 類似?