Memo with simplest, and easiest youtube video.
Speaker is formal Google SW engineer, John Fries(He is OpenMail CTO now)
* Summary: Pandas is the most strong data analysis tool, so people should know this first at learning python.
* Abstract
1. Distribution
2. Data reading
3. Data munging
4. Graph(Chart) drawing
1. Distribution
2. Data reading
3. Data munging
4. Graph(Chart) drawing
* Details
1. Tools
a. Anaconda
b. Ipython Notebook when explaining
2. Data Reading
a. make DataFrame(2 dimension)(Series 1 dimenstion, Panel 3 dimension)
b. read from csv (dataframe = read_csv(file address) <- simple)
3. Data munging - manipulating data
a. basic method
i. select(indexing)
1) .ix[] - basic indexing(mostly, unless row is integer)
2) .loc[] - label based indexing, after ix
3) .iloc[] - positional indexing, integer row based indexing
4) .xs() - multi index level selecting
5) .iat[], at[] - no frequent use
ii. filter
1) specific column, specific condition
2) boolean indexing(same length subset can be retrived)
b. others
i. update(update contents)
1) .loc[]
ii. insert(add contents)
1) no recommend, but can be
iii. map(Series), append(concatenate to dataframe)., join(add columns to different dataframe), group(grouping row or column), summarize agg(), sorting, clean na(dropna, fillna) drop duplicates, clean outliers, conform 잘 이해 안가지만 reindex 하거나 resample 하거나 등등, bin, rotate 멀티 인덱스 등 unstack 도 할 수 있다.(테이블로 바꾸어 버리는 것) unstack 두번하면 rotate 된다.
4. Graph drawing - use matplotlib, after data munging use the data as a input.
1. Tools
a. Anaconda
b. Ipython Notebook when explaining
2. Data Reading
a. make DataFrame(2 dimension)(Series 1 dimenstion, Panel 3 dimension)
b. read from csv (dataframe = read_csv(file address) <- simple)
3. Data munging - manipulating data
a. basic method
i. select(indexing)
1) .ix[] - basic indexing(mostly, unless row is integer)
2) .loc[] - label based indexing, after ix
3) .iloc[] - positional indexing, integer row based indexing
4) .xs() - multi index level selecting
5) .iat[], at[] - no frequent use
ii. filter
1) specific column, specific condition
2) boolean indexing(same length subset can be retrived)
b. others
i. update(update contents)
1) .loc[]
ii. insert(add contents)
1) no recommend, but can be
iii. map(Series), append(concatenate to dataframe)., join(add columns to different dataframe), group(grouping row or column), summarize agg(), sorting, clean na(dropna, fillna) drop duplicates, clean outliers, conform 잘 이해 안가지만 reindex 하거나 resample 하거나 등등, bin, rotate 멀티 인덱스 등 unstack 도 할 수 있다.(테이블로 바꾸어 버리는 것) unstack 두번하면 rotate 된다.
4. Graph drawing - use matplotlib, after data munging use the data as a input.
*Add - 2016-01-17 Simplify Indexing
Different Choices for Indexing
Selection by Label, Boolean Array
single, array, slice of label and boolean indexing
.loc - location
Selection by Position
single, array, slice of integer index and boolean indexing
.iloc - integer location
Advanced Indexing and Advanced Hierarchical.
. Label ->(if not) Index Selection but if the label is integer, only Label Based Selection
Basics
Wen Using [] lower level(Series-> scalar vlaue), Daraframe->Series, Panel->Dataframe
댓글 없음:
댓글 쓰기