2015년 10월 10일 토요일

(E)Simple but Core of Python Pandas.

Memo with simplest, and easiest youtube video.
Speaker is formal Google SW engineer, John Fries(He is OpenMail CTO now)


* Summary: Pandas is the most strong data analysis tool, so people should know this first at learning python.


* Abstract
   1. Distribution
   2. Data reading
   3. Data munging
   4. Graph(Chart) drawing


* Details
   1. Tools
      a. Anaconda
      b. Ipython Notebook when explaining
   2. Data Reading
      a. make DataFrame(2 dimension)(Series 1 dimenstion, Panel 3 dimension)
      b. read from csv (dataframe = read_csv(file address) <- simple)
   3. Data munging - manipulating data
      a. basic method
         i. select(indexing)
            1) .ix[]    - basic indexing(mostly, unless row is integer)
            2) .loc[]  - label based indexing, after ix
            3) .iloc[] - positional indexing, integer row based indexing
            4) .xs()   - multi index level selecting
            5) .iat[], at[] - no frequent use
         ii. filter
            1) specific column, specific condition
            2) boolean indexing(same length subset can be retrived)
      b. others
         i. update(update contents)
            1) .loc[]
         ii. insert(add contents)
            1) no recommend, but can be
     iii. map(Series), append(concatenate to dataframe)., join(add columns to different dataframe), group(grouping row or column), summarize agg(), sorting, clean na(dropna, fillna) drop duplicates, clean outliers, conform 잘 이해 안가지만 reindex 하거나 resample 하거나 등등, bin, rotate 멀티 인덱스 등 unstack 도 할 수 있다.(테이블로 바꾸어 버리는 것) unstack 두번하면 rotate 된다.

   4. Graph drawing - use matplotlib, after data munging use the data as a input.


*Add - 2016-01-17 Simplify Indexing
Different Choices for Indexing Selection by Label, Boolean Array single, array, slice of label and boolean indexing .loc - location Selection by Position single, array, slice of integer index and boolean indexing .iloc - integer location Advanced Indexing and Advanced Hierarchical. . Label ->(if not) Index Selection but if the label is integer, only Label Based Selection

Basics Wen Using [] lower level(Series-> scalar vlaue), Daraframe->Series, Panel->Dataframe


댓글 없음:

댓글 쓰기