Wednesday, February 10, 2010

Excel VLOOKUP() returns wrong results

I was trying VLOOKUP with a huge array of numerical data and it was returning the wrong results. I did some search on the internet but the results were not really helpful.
I turned to the Excel 2007 help and I found the solution as the following:

range_lookup  Optional. A logical value that specifies whether you want VLOOKUP to find an exact match or an approximate match:
  • If range_lookup is either TRUE or is omitted, an exact or approximate match is returned. If an exact match is not found, the next largest value that is less than lookup_value is returned. Important  If range_lookup is either TRUE or is omitted, the values in the first column of table_array must be placed in ascending sort order; otherwise, VLOOKUP might not return the correct value.
    For more information, see Sort data.
    If range_lookup is FALSE, the values in the first column of table_array do not need to be sorted.
  • If the range_lookup argument is FALSE, VLOOKUP will find only an exact match. If there are two or more values in the first column of table_array that match the lookup_value, the first value found is used. If an exact match is not found, the error value #N/A is returned.

So omitting the range_lookup value would be interpreted as true which leads approximate matching. It seems excel prefers to use approximate matching instead of exact matching when the range_lookup is not specified. This is werid and against the common sense :)

Tuesday, January 26, 2010

General notes for linking databases

I was thinking of writing about this linkage issues between financial databases. However, I found out that Jie Cao was faster than me. So I quoted him instead of rewriting what he has documented already (http://ihome.cuhk.edu.hk/~b121456/tools.html).

  • NCUSIP is the historical CUSIP and changes over time. CUSIP is the current CUSIP and does not change over time. A historical NCUSIP during a specific period will correspond to only one current CUSIP. [www.cusip.com]
  • The NCUSIP in Thomson, I/B/E/S, ISSM, TAQ and  Option-Metrics is labeled as 'CUSIP'.
  • In Compustat, CNUM + first 2 digit of CIC is the CUSIP.
  • The major matching variable across databases are NCUSIP and then Ticker.  
  • The CUSIP-NCUSIP transition file builds a link between NCUSIP and CUSIP as well as PERMNO at a specified time interval. [Download the transition file here]
  • For ISSM database, all NYSE and AMEX stocks from 1983 to 1992, and NASDAQ stocks after 1990 can be matched by NCUSIP. NASDAQ stocks before 1990 could be matched by SMBL, which at a given month & exchange corresponds to the Ticker in CRSP.
  • For TAQ databse, stocks can be matched by the first 8 digits of TAQ's 12-digit NCUSIP.
  • Mutual Fund Links (MFLINKS) connects CRSP mutual fund information to Thomson (S12) mutual fund holding data. 
  • Matching by company or fund name is difficult as the last resort. The SAS function 'SPEDIS" can determine the likelihood of two words matching.  
  • Extra efforts are needed for a precise matching. See this sample SAS Code to generate a link between I/B/E/S and CRSP using multiple identifiers. (Internet connection and access to both I/B/E/S and CRSP data at WRDS are required)

Wednesday, January 6, 2010

CUSIP and CFMRC Dataset

If you would work with Canadian stocks info data, you would notice that CFMRC (the leading database for Canadian securities and the equivalent of CRPS (for US stocks) does not always list the CUSIP identifier with the data.

CUSIP is highly important to link the CFMRC data with other databases as tickers are not always reliable. In CFMRC (the Windows client), CUSIP is not produced in all formats. One has to select the extended format to get CUSIP in the output.