hp2FEM  0.1
Defines | Functions
src/metis-5.0/GKlib/csr.c File Reference

Various routines with dealing with CSR matrices. More...

#include <GKlib.h>
Include dependency graph for csr.c:

Defines

#define OMPMINOPS   50000

Functions

gk_csr_tgk_csr_Create ()
void gk_csr_Init (gk_csr_t *mat)
void gk_csr_Free (gk_csr_t **mat)
void gk_csr_FreeContents (gk_csr_t *mat)
gk_csr_tgk_csr_Dup (gk_csr_t *mat)
gk_csr_tgk_csr_ExtractSubmatrix (gk_csr_t *mat, int rstart, int nrows)
gk_csr_tgk_csr_ExtractRows (gk_csr_t *mat, int nrows, int *rind)
gk_csr_tgk_csr_ExtractPartition (gk_csr_t *mat, int *part, int pid)
gk_csr_t ** gk_csr_Split (gk_csr_t *mat, int *color)
gk_csr_tgk_csr_Read (char *filename, int format, int readvals, int numbering)
void gk_csr_Write (gk_csr_t *mat, char *filename, int format, int writevals, int numbering)
gk_csr_tgk_csr_Prune (gk_csr_t *mat, int what, int minf, int maxf)
gk_csr_tgk_csr_LowFilter (gk_csr_t *mat, int what, int norm, float fraction)
gk_csr_tgk_csr_TopKPlusFilter (gk_csr_t *mat, int what, int topk, float keepval)
gk_csr_tgk_csr_ZScoreFilter (gk_csr_t *mat, int what, float zscore)
void gk_csr_CompactColumns (gk_csr_t *mat)
void gk_csr_SortIndices (gk_csr_t *mat, int what)
void gk_csr_CreateIndex (gk_csr_t *mat, int what)
void gk_csr_Normalize (gk_csr_t *mat, int what, int norm)
void gk_csr_Scale (gk_csr_t *mat, int type)
void gk_csr_ComputeSums (gk_csr_t *mat, int what)
void gk_csr_ComputeSquaredNorms (gk_csr_t *mat, int what)
float gk_csr_ComputeSimilarity (gk_csr_t *mat, int i1, int i2, int what, int simtype)
int gk_csr_GetSimilarRows (gk_csr_t *mat, int nqterms, int *qind, float *qval, int simtype, int nsim, float minsim, gk_fkv_t *hits, int *i_marker, gk_fkv_t *i_cand)

Detailed Description

Various routines with dealing with CSR matrices.

Author:
George Karypis
Version:
$Id: csr.c 10537 2011-07-11 05:27:54Z karypis $ 

Function Documentation

Compacts the column-space of the matrix by removing empty columns. As a result of the compaction, the column numbers are renumbered. The compaction operation is done in place and only affects the row-based representation of the matrix.

Parameters:
matthe matrix whose empty columns will be removed.
float gk_csr_ComputeSimilarity ( gk_csr_t mat,
int  i1,
int  i2,
int  what,
int  simtype 
)

Computes the similarity between two rows/columns

Parameters:
matthe matrix itself. The routine assumes that the indices are sorted in increasing order.
i1is the first row/column,
i2is the second row/column,
whatis either GK_CSR_ROW or GK_CSR_COL indicating the type of objects between the similarity will be computed,
simtypeis the type of similarity and is one of GK_CSR_COS, GK_CSR_JAC, GK_CSR_MIN
Returns:
the similarity between the two rows/columns.
void gk_csr_ComputeSquaredNorms ( gk_csr_t mat,
int  what 
)

Computes the squared of the norms of the rows/columns

Parameters:
matthe matrix itself,
whatis either GK_CSR_ROW or GK_CSR_COL indicating which squared norms to compute.
void gk_csr_ComputeSums ( gk_csr_t mat,
int  what 
)

Computes the sums of the rows/columns

Parameters:
matthe matrix itself,
whatis either GK_CSR_ROW or GK_CSR_COL indicating which sums to compute.

Allocate memory for a CSR matrix and initializes it

Returns:
the allocated matrix. The various fields are set to NULL.
void gk_csr_CreateIndex ( gk_csr_t mat,
int  what 
)

Creates a row/column index from the column/row data.

Parameters:
matthe matrix itself,
whatis either GK_CSR_ROW or GK_CSR_COL indicating which index will be created.

Returns a copy of a matrix.

Parameters:
matis the matrix to be duplicated.
Returns:
the newly created copy of the matrix.
gk_csr_t* gk_csr_ExtractPartition ( gk_csr_t mat,
int *  part,
int  pid 
)

Returns a submatrix corresponding to a specified partitioning of rows.

Parameters:
matis the original matrix.
partis the partitioning vector of the rows.
pidis the partition ID that will be extracted.
Returns:
the row structure of the newly created submatrix.
gk_csr_t* gk_csr_ExtractRows ( gk_csr_t mat,
int  nrows,
int *  rind 
)

Returns a submatrix containing a certain set of rows.

Parameters:
matis the original matrix.
nrowsis the number of rows to extract.
rindis the set of row numbers to extract.
Returns:
the row structure of the newly created submatrix.
gk_csr_t* gk_csr_ExtractSubmatrix ( gk_csr_t mat,
int  rstart,
int  nrows 
)

Returns a submatrix containint a set of consecutive rows.

Parameters:
matis the original matrix.
rstartis the starting row.
nrowsis the number of rows from rstart to extract.
Returns:
the row structure of the newly created submatrix.
void gk_csr_Free ( gk_csr_t **  mat)

Frees all the memory allocated for matrix.

Parameters:
matis the matrix to be freed.
void gk_csr_FreeContents ( gk_csr_t mat)

Frees only the memory allocated for the matrix's different fields and sets them to NULL.

Parameters:
matis the matrix whose contents will be freed.
int gk_csr_GetSimilarRows ( gk_csr_t mat,
int  nqterms,
int *  qind,
float *  qval,
int  simtype,
int  nsim,
float  minsim,
gk_fkv_t *  hits,
int *  i_marker,
gk_fkv_t *  i_cand 
)

Finds the n most similar rows (neighbors) to the query using cosine similarity.

Parameters:
matthe matrix itself
nqtermsis the number of columns in the query
qindis the list of query columns
qvalis the list of correspodning query weights
simtypeis the type of similarity and is one of GK_CSR_COS, GK_CSR_JAC, GK_CSR_MIN
nsimis the maximum number of requested most similar rows. If -1 is provided, then everything is returned unsorted.
minsimis the minimum similarity of the requested most similar rows
hitsis the result set. This array should be at least of length nsim.
i_markeris an array of size equal to the number of rows whose values are initialized to -1. If NULL is provided then this array is allocated and freed internally.
i_candis an array of size equal to the number of rows. If NULL is provided then this array is allocated and freed internally.
Returns:
the number of identified most similar rows, which can be smaller than the requested number of nnbrs in those cases in which there are no sufficiently many neighbors.
void gk_csr_Init ( gk_csr_t mat)

Initializes the matrix

Parameters:
matis the matrix to be initialized.
gk_csr_t* gk_csr_LowFilter ( gk_csr_t mat,
int  what,
int  norm,
float  fraction 
)

Eliminates certain entries from the rows/columns of the matrix. The filtering takes place by keeping only the highest weight entries whose sum accounts for a certain fraction of the overall weight of the row/column.

Parameters:
matthe matrix to be prunned,
whatindicates if the rows (GK_CSR_ROW) or the columns (GK_CSR_COL) of the matrix will be prunned,
normindicates the norm that will be used to aggregate the weights and possible values are 1 or 2,
fractionis the fraction of the overall norm that will be retained by the kept entries.
Returns:
the filtered matrix consisting only of its row-based structure. The input matrix is not modified.
void gk_csr_Normalize ( gk_csr_t mat,
int  what,
int  norm 
)

Normalizes the rows/columns of the matrix to be unit length.

Parameters:
matthe matrix itself,
whatindicates what will be normalized and is obtained by specifying GK_CSR_ROW, GK_CSR_COL, GK_CSR_ROW|GK_CSR_COL.
normindicates what norm is to normalize to, 1: 1-norm, 2: 2-norm
gk_csr_t* gk_csr_Prune ( gk_csr_t mat,
int  what,
int  minf,
int  maxf 
)

Prunes certain rows/columns of the matrix. The prunning takes place by analyzing the row structure of the matrix. The prunning takes place by removing rows/columns but it does not affect the numbering of the remaining rows/columns.

Parameters:
matthe matrix to be prunned,
whatindicates if the rows (GK_CSR_ROW) or the columns (GK_CSR_COL) of the matrix will be prunned,
minfis the minimum number of rows (columns) that a column (row) must be present in order to be kept,
maxfis the maximum number of rows (columns) that a column (row) must be present at in order to be kept.
Returns:
the prunned matrix consisting only of its row-based structure. The input matrix is not modified.
gk_csr_t* gk_csr_Read ( char *  filename,
int  format,
int  readvals,
int  numbering 
)

Reads a CSR matrix from the supplied file and stores it the matrix's forward structure.

Parameters:
filenameis the file that stores the data.
formatis either GK_CSR_FMT_CLUTO or GK_CSR_FMT_CSR specifying the type of the input format. The difference between the two formats is that GK_CSR_FMT_CLUTO requires the header line, whereas GK_CSR_FMT_CSR does not.
readvalsis either 1 or 0, indicating if the CSR file contains values or it does not. It only applies when GK_CSR_FMT_CSR is used.
numberingis either 1 or 0, indicating if the numbering of the indices start from 1 or 0, respectively. If they start from 1, they are automatically decreamented during input so that they will start from 0. It only applies when GK_CSR_FMT_CSR is used.
Returns:
the matrix that was read.
void gk_csr_Scale ( gk_csr_t mat,
int  type 
)

Applies different row scaling methods.

Parameters:
matthe matrix itself,
typeindicates the type of row scaling. Possible values are: GK_CSR_MAXTF, GK_CSR_SQRT, GK_CSR_LOG, GK_CSR_IDF.
void gk_csr_SortIndices ( gk_csr_t mat,
int  what 
)

Sorts the indices in increasing order

Parameters:
matthe matrix itself,
whatis either GK_CSR_ROW or GK_CSR_COL indicating which set of indices to sort.
gk_csr_t** gk_csr_Split ( gk_csr_t mat,
int *  color 
)

Splits the matrix into multiple sub-matrices based on the provided color array.

Parameters:
matis the original matrix.
coloris an array of size equal to the number of non-zeros in the matrix (row-wise structure). The matrix is split into as many parts as the number of colors. For meaningfull results, the colors should be numbered consecutively starting from 0.
Returns:
an array of matrices for each supplied color number.
gk_csr_t* gk_csr_TopKPlusFilter ( gk_csr_t mat,
int  what,
int  topk,
float  keepval 
)

Eliminates certain entries from the rows/columns of the matrix. The filtering takes place by keeping only the highest weight top-K entries along each row/column and those entries whose weight is greater than a specified value.

Parameters:
matthe matrix to be prunned,
whatindicates if the rows (GK_CSR_ROW) or the columns (GK_CSR_COL) of the matrix will be prunned,
topkis the number of the highest weight entries to keep.
keepvalis the weight of a term above which will be kept. This is used to select additional terms past the first topk.
Returns:
the filtered matrix consisting only of its row-based structure. The input matrix is not modified.
void gk_csr_Write ( gk_csr_t mat,
char *  filename,
int  format,
int  writevals,
int  numbering 
)

Writes the row-based structure of a matrix into a file.

Parameters:
matis the matrix to be written,
filenameis the name of the output file.
formatis one of GK_CSR_FMT_CLUTO or GK_CSR_FMT_CSR specifying the format of the output file.
writevalsis either 1 or 0 indicating if the values will be written or not. This is only applicable when GK_CSR_FMT_CSR is used.
numberingis either 1 or 0 indicating if the internal 0-based numbering will be shifted by one or not during output. This is only applicable when GK_CSR_FMT_CSR is used.
gk_csr_t* gk_csr_ZScoreFilter ( gk_csr_t mat,
int  what,
float  zscore 
)

Eliminates certain entries from the rows/columns of the matrix. The filtering takes place by keeping only the terms whose contribution to the total length of the document is greater than a user-splied multiple over the average.

This routine assumes that the vectors are normalized to be unit length.

Parameters:
matthe matrix to be prunned,
whatindicates if the rows (GK_CSR_ROW) or the columns (GK_CSR_COL) of the matrix will be prunned,
zscoreis the multiplicative factor over the average contribution to the length of the document.
Returns:
the filtered matrix consisting only of its row-based structure. The input matrix is not modified.
 All Classes Files Functions Variables Typedefs Friends Defines