質問

This php function retrieves a list of common words used in a string and excludes a blacklist of words.

Array1: a,b,c

Although a default blacklist is useful, I needed to add words to the blacklist from a database.

Array2: d,e,f

I added the MYSQL which gets an additional list from an field in our services table. I explode \n from the words into an array and merge the two arrays at the beginning of the function so that the blacklist is now

Array3: a,b,c,d,e,f

To test I used print_r to display the array and it does merge successfully.

The problem is this...

If I manually add d,e,f to the default array the script returns a clean list of words. If I merge the two arrays into one its returning the list of words with the blacklist words still in it.

Why would the merged array be any different than just adding to the default array?

Here is the function

function extractCommonWords($string,$init_blacklist){

    /// the default blacklist words
    $stopWords = array('a','b','c');

    /// select the additional blacklist words from the database
    $gettingblack_sql = "SELECT g_serv_blacklist FROM services WHERE g_serv_id='".$init_blacklist."' LIMIT 1";
    $gettingblack_result = mysql_query($gettingblack_sql) or die(mysql_error());
    $gettingblack_row = mysql_fetch_array($gettingblack_result);
    $removingblack_array = explode("\n", $gettingblack_row["g_serv_blacklist"]);

    // this adds the d,e,f array from the database to the default a,b,c blacklist
    $stopWords = array_merge($stopWords,$removingblack_array);

    // replace whitespace
    $string = preg_replace('/\s\s+/i', '', $string); 
    $string = trim($string);

    // only take alphanumerical chars, but keep the spaces and dashes too
    $string = preg_replace('/[^a-zA-Z0-9 -]/', '', $string); 

    // make it lowercase
    $string = strtolower($string); 

    preg_match_all('/\b.*?\b/i', $string, $matchWords);
    $matchWords = $matchWords[0];

    foreach ($matchWords as $key => $item) {
    if ($item == '' || in_array(strtolower($item), $stopWords) || strlen($item) <= 3){
    unset($matchWords[$key]);}}

    $wordCountArr = array();

    if (is_array($matchWords)) {
        foreach ($matchWords as $key => $val) {
            $val = strtolower($val);
            if (isset($wordCountArr[$val])) {
                $wordCountArr[$val]++;
            } else {
                $wordCountArr[$val] = 1;
            }
        }
    }
    arsort($wordCountArr);
    $wordCountArr = array_slice($wordCountArr, 0, 30);
    return $wordCountArr;
}
/// end of function



    /// posted string =  a b c d e f g
    $generate = $_POST["generate"];

    /// the unique id of the row to retrieve additional blacklist keywords from
    $generate_id = $_POST["generate_id"];

    /// run the function by passing the text string and the id 
    $generate = extractCommonWords($generate, $generate_id);

    /// update the database with the result
    $update_data = "UPDATE services SET 
    g_serv_tags='".implode(',', array_keys($generate))."' 
    WHERE g_serv_acct='".$_SESSION["session_id"]."' 
    AND g_serv_id='".$generate_id."' LIMIT 1";
    $update_result = mysql_query($update_data);
    if(!$update_result){die('Invalid query:' . mysql_error());}
    else{echo str_replace(",",", ",implode(',', array_keys($generate)));}
    /// end of database update
役に立ちましたか?

解決

If the extra blacklist in the database was populated in an admin panel from a Windows client, there is likely to be a stray \r at the end of each word. Thus, your list would be a,b,c,d\r,e\r,f\r.

Try replacing this line:

$removingblack_array = explode("\n", $gettingblack_row["g_serv_blacklist"]);

with this:

$removingblack_array = preg_split('/(\r|\n|\r\n)/', $gettingblack_row["g_serv_blacklist"]);
ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top